DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Remarks
The applicant uses notation of ###S (# = number) prior to paragraph 78 to represent sign bit, however for paragraphs 78 and 79 and figure 8, the applicant uses the notation of ###S to represent Exponent (###E prior to par.78 was used to represent exponent and ###M prior to par.78 was used to represent mantissa). This seems inconsistent with what the applicant wants.
Paragraphs 82, 85, and 86, mentions figure 9B, however there is no figure 9B shown. Based on par. 85, it is likely that figure 9B should be figure 10, additionally elements numbers/characters in figure 10 overlaps with figure 9 and similarly stated in the specification paragraphs 85-87. This seems inconsistent with what the applicant wants.
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because:
In fig.6, reference character “616” has been used to designate two different numbers with different mantissas;
In fig.6, reference character “615E has been used to designate two different exponents;
In figures 9 and 10, reference character “940” has been used to designate both “drain 940” and “input register file 940”;
In figures 9 and 10, reference character “950” has been used to designate both “input signal 950” and “weight register file 950”;
In figures 9 and 10, reference character “960” has been used to designate both “input signal 960” and “output register file 960”;
In figures 9 and 10, reference character “970” has been used to designate both “output signal 970” and “MAC unit 970”.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description:
In par.70, 615 is not shown in figure 6;
In par.71, 616E is not shown in figure 6;
In par.78, 825M is not shown in figure 8;
In par.82, 85, and 86, Figure 9B is not shown.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to as failing to comply with 37 CFR 1.71(a) because of the following informalities:
In par.78, ll.12, “the mantissa 825E” should read as “the mantissa 825M”.
In par.79, ll.6, “the mantissa 825E” should read as “the mantissa 825M”.
Appropriate correction is required.
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Objections
Claims 2, 4, 5, 8, and 21 objected to as failing to comply with 37 CFR 1.75(a) because of the following informalities:
Claim 4, ll.10, “shifting factor” should read as “the shifting factor”.
Claim 5, p.51, ll.4, “shifting factor” should read as “the shifting factor”.
Claim 8, ll.2-3, “transformed to the floating-point product a second digital circuit” should read as “ transformed to the floating-point product by a second digital circuit” (emphasis added)
Claim 21, ll.4, “the row comprises” should read as “the floating-point row comprises”
Claim 21, ll.5, “the column comprises” should read as “the floating-point column comprises”
Appropriate correction is required
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2, 12, and 24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 2, 12, and 24 (hereinafter child claims) recites the limitations of "a first extreme component" and “a second extreme component”. The respective claims’ parent claims includes the limitations of "a first extreme component" and “a second extreme component”. It is unclear if the child claims are reciting a New first and second extreme component or if it depends on the limitation of the parent claims. For purposes of examination, the child claims limitations are a new set of “extreme components”.
Additionally, claims 2 and 12 recite “a digital circuit” twice, it is unclear if it refers to the same digital circuit or if there are two separate digital circuits. For purposes of examination, the digital circuit is the same.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Under the Alice Framework Step 1, claims 1-10 recite a method and, therefore, is a process. Claims 11-20 recites a non-transitory computer-readable media and, therefore, is an article of manufacture. Claims 21-25 recites a DNN accelerator and, therefore, is a machine.
Under the Alice Framework Step 2A prong 1, claim 21 recites
a memory for storing a first extreme exponent for a floating-point row in a first floating- point matrix and a second extreme exponent for a floating-point column in a second floating- point matrix, wherein the row comprises row elements, the first extreme exponent is a highest exponent of exponents of the row elements, the column comprises column elements, and the second extreme exponent is a highest exponent of exponents of the column elements;
one or more first digital circuits configured to:
retrieve the first extreme exponent and the second extreme exponent from the memory,
transform the floating-point row to a fixed-point row including first fixed- point numbers based on the first extreme exponent in the memory, and
transform the floating-point column to a fixed-point column including second fixed-point numbers based on the second extreme exponent;
an array of processing elements configured to:
perform a multiplication operation on the fixed-point row and the fixed-point column to generate a fixed-point product; and
one or more second digital circuits configured to:
after generating the fixed-point product, retrieve the first extreme exponent and the second extreme exponent from the memory, and
transforming the fixed-point product to a floating-point product based on the first extreme exponent and the second extreme exponent.
The above underlined limitations are related to computing fixed-point dot-product by converting floating point into fixed point and converting fixed point to floating point which amount to mathematical calculations and relationships that falls under “Mathematical Concepts” of abstract ideas (see at least specification paragraphs 44-63 and 67-73). Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “a memory for storing a first extreme exponent… and a second extreme exponent…”, “first digital circuits”, “retrieve the first extreme exponent and the second extreme exponent from the memory”, “an array of processing elements”, “second digital circuits”, and “retrieve the first extreme exponent and the second extreme exponent from the memory”. However, the additional elements of memory, first digital circuits, array of processing elements, and second digital circuits are recited at a high-level of generality (i.e., as a generic computer component for storing the data; as a generic computer component for converting the data; and as a generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “a memory for storing a first extreme exponent… and a second extreme exponent…”, “retrieve the first extreme exponent and the second extreme exponent from the memory”, and “retrieve the first extreme exponent and the second extreme exponent from the memory” are merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of memory, first digital circuits, array of processing elements, and second digital circuits are recited at a high-level of generality (i.e., as a generic computer component for storing the data; as a generic computer component for converting the data; and as a generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “a memory for storing a first extreme exponent… and a second extreme exponent…”, “retrieve the first extreme exponent and the second extreme exponent from the memory”, and “retrieve the first extreme exponent and the second extreme exponent from the memory” are merely adding insignificant extra-solution activities. See MPEP 2106.05(d)(II) which states that the courts have recognized computer functions such as “Storing and retrieving information in memory” as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 1, claims 22-25 recite further steps and details to computing fixed-point dot-product by converting floating point into fixed point and converting fixed point to floating point which amount to mathematical calculations and relationships that falls under “Mathematical Concepts” of abstract ideas and/or “mental step” of abstract ideas.
Claim 22 is directed to applying the math in an array. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “one or more first digital circuits and… second digital circuits are arranged outside the array of processing elements”. However, the additional elements of first digital circuits and second digital circuits are arranged outside the array of processing elements are recited at a high-level of generality (i.e., as merely applying the math in the format needed for the generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of first digital circuits and second digital circuits are arranged outside the array of processing elements are recited at a high-level of generality (i.e., as merely applying the math in the format needed for the generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claim 23 is directed to using a cache for memory. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “a cache”. However, the additional elements of a cache is recited at a high-level of generality (i.e., as a generic computer component for storing the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of a cache is recited at a high-level of generality (i.e., as a generic computer component for storing the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claim 24 is directed to finding the maximum exponent of a row. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: a cache line, third digital circuits, “receive a first extreme exponent from the memory”, “receive an exponent of a first row element”, “receive the second extreme exponent from the memory”, and “the third extreme exponent is stored in the memory”. However, the additional elements of a cache line and third digital circuits are recited at a high-level of generality (i.e., as a generic computer component for storing the data; and as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “receive a first extreme exponent from the memory”, “receive an exponent of a first row element”, “receive the second extreme exponent from the memory”, and “the third extreme exponent is stored in the memory” are merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of a cache line and third digital circuits are recited at a high-level of generality (i.e., as a generic computer component for storing the data; and as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “receive a first extreme exponent from the memory”, “receive an exponent of a first row element”, “receive the second extreme exponent from the memory”, and “the third extreme exponent is stored in the memory” are merely adding insignificant extra-solution activities. See MPEP 2106.05(d)(II) which states that the courts have recognized computer functions such as “Storing and retrieving information in memory” as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claim 25 is directed to finding the maximum exponent of a column Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: third digital circuits and “the higher exponent is stored in the memory”. However, the additional elements of third digital circuits are recited at a high-level of generality (i.e., as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “the higher exponent is stored in the memory” is merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of third digital circuits are recited at a high-level of generality (i.e., as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “the higher exponent is stored in the memory” is merely adding insignificant extra-solution activities. See MPEP 2106.05(d)(II) which states that the courts have recognized computer functions such as “Storing and retrieving information in memory” as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 1, claim 11 recites
One or more non-transitory computer-readable media storing instructions executable to perform operations for deep learning, the operations comprising:
storing, in a memory, a first extreme exponent for a floating-point row in a first floating- point matrix and a second extreme exponent for a floating-point column in a second floating- point matrix, wherein the row comprises row elements, the first extreme exponent is a highest exponent of exponents of the row elements, the column comprises column elements, and the second extreme exponent is a highest exponent of exponents of the column elements;
transforming the floating-point row to a fixed-point row including first fixed-point numbers based on the first extreme exponent in the memory;
transforming the floating-point column to a fixed-point column including second fixed- point numbers based on the second extreme exponent;
performing, by an array of processing elements, a multiplication operation on the fixed- point row and the fixed-point column to generate a fixed-point product;
after generating the fixed-point product, retrieving the first extreme exponent and the second extreme exponent from the memory; and
transforming the fixed-point product to a floating-point product based on the first extreme exponent and the second extreme exponent.
The above underlined limitations are related to computing fixed-point dot-product by converting floating point into fixed point and converting fixed point to floating point for further mathematical operations which amount to mathematical calculations and relationships that falls under “Mathematical Concepts” of abstract ideas (see at least specification paragraphs 44-63 and 67-73). Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “media storing instructions“, “a memory for storing a first extreme exponent… and a second extreme exponent…”, “an array of processing elements”, and “retrieving the first extreme exponent and the second extreme exponent from the memory”. However, the additional elements of “media”, memory, and array of processing elements are recited at a high-level of generality (i.e., as a generic computer component for storing the data or instructions; and as a generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “media storing instructions”, “a memory for storing a first extreme exponent… and a second extreme exponent…”, and “retrieving the first extreme exponent and the second extreme exponent from the memory” are merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “media”, memory, and array of processing elements are recited at a high-level of generality (i.e., as a generic computer component for storing the data or instructions; and as a generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “media storing instructions”, “a memory for storing a first extreme exponent… and a second extreme exponent…”, and “retrieving the first extreme exponent and the second extreme exponent from the memory” are merely adding insignificant extra-solution activities. See MPEP 2106.05(d)(II) which states that the courts have recognized computer functions such as “Storing and retrieving information in memory” as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 1, claims 12-20 recite further steps and details to computing fixed-point dot-product by converting floating point into fixed point and converting fixed point to floating point which amount to mathematical calculations and relationships that falls under “Mathematical Concepts” of abstract ideas and/or “mental step” of abstract ideas.
Claim 12 is directed to finding the maximum exponent of a row. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “the row elements are stored as a cache line”, a digital circuit, “retrieving a first extreme exponent from the memory”, “inputting the first extreme exponent and an exponent of a first row element in the cache line into a digital circuit”, “inputting the second extreme exponent and an exponent of a second row element in the cache line into a digital circuit”, and “storing the third extreme exponent in the memory”. However, the additional elements of a cache line and digital circuit are recited at a high-level of generality (i.e., as a generic computer component for storing the data; and as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “the row elements are stored as a cache line”, “retrieving a first extreme exponent from the memory”, “inputting the first extreme exponent and an exponent of a first row element in the cache line into a digital circuit”, “inputting the second extreme exponent and an exponent of a second row element in the cache line into a digital circuit”, and “storing the third extreme exponent in the memory” are merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of a cache line and digital circuit are recited at a high-level of generality (i.e., as a generic computer component for storing the data; and as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “the row elements are stored as a cache line”, “retrieving a first extreme exponent from the memory”, “inputting the first extreme exponent and an exponent of a first row element in the cache line into a digital circuit”, “inputting the second extreme exponent and an exponent of a second row element in the cache line into a digital circuit”, and “storing the third extreme exponent in the memory” are merely adding insignificant extra-solution activities. See MPEP 2106.05(d)(II) which states that the courts have recognized computer functions such as “Storing and retrieving information in memory” as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claim 13 is directed to finding the maximum exponent of a column Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “inputting the column elements in the column into a group of digital circuits”, “each digital circuit receiving a different column element in the column”, and “storing the higher exponent in the memory”. However, the additional elements of digital circuits are recited at a high-level of generality (i.e., as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “inputting the column elements in the column into a group of digital circuits”, “each digital circuit receiving a different column element in the column”, and “storing the higher exponent in the memory” are merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of digital circuits are recited at a high-level of generality (i.e., as a generic computer component for comparing data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements of “inputting the column elements in the column into a group of digital circuits”, “each digital circuit receiving a different column element in the column”, and “storing the higher exponent in the memory” are merely adding insignificant extra-solution activities. See MPEP 2106.05(d)(II) which states that the courts have recognized computer functions such as “Storing and retrieving information in memory” as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claim 14 is directed to computing the shifting factor and converting floating point into fixed point for the row data. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “first digital circuit” and “second digital circuit”. However, the additional elements of first digital circuit and second digital circuit are recited at a high-level of generality (i.e., as a generic computer component for computing the difference or shifting) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of first digital circuit and second digital circuit are recited at a high-level of generality (i.e., as a generic computer component for computing the difference or shifting) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claim 15 is directed to computing the shifting factor and converting floating point into fixed point for the column data. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “first digital circuit” and “second digital circuit”. However, the additional elements of first digital circuit and second digital circuit are recited at a high-level of generality (i.e., as a generic computer component for computing the difference or shifting) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of first digital circuit and second digital circuit are recited at a high-level of generality (i.e., as a generic computer component for computing the difference or shifting) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claims 16 and 17 are directed to converting fixed point into floating point. In particular the claims does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claims recites an abstract idea.
Claim 18 is directed to converting fixed point into floating point. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: first digital circuit and second digital circuit and the first digital circuit and the second digital circuit are arranged outside the array of processing elements. However, the additional elements of first digital circuit and second digital circuit and the first digital circuit and the second digital circuit are arranged outside the array of processing elements are recited at a high-level of generality (i.e., as a generic computer component for converting the data; and as merely applying the math in the format needed for the generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of first digital circuit and second digital circuit and the first digital circuit and the second digital circuit are arranged outside the array of processing elements are recited at a high-level of generality (i.e., as a generic computer component for converting the data; and as merely applying the math in the format needed for the generic computer component for multiplying the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Claim 19 is directed to using a cache for the memory. Accordingly, the claims recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “a cache”. However, the additional elements of a cache is recited at a high-level of generality (i.e., as a generic computer component for storing the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of a cache is recited at a high-level of generality (i.e., as a generic computer component for storing the data) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract
Claim 20 is directed to an additional addition of another dot-product. In particular the claim does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claims recites an abstract idea.
Claims 1-10 are directed to claims 11-20, respectively. A mere change in statutory class is obvious. Claims 1-10 are rejected for the reasons given in claims 11-20, respectively.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 6, 8-9, 11, 16, 18-19, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Darvish Rouhani et al. (US 2020/0193274 A1), hereinafter Darvish, and in view of Bittner et al. (US 2018/0157465 A1), hereinafter Bittner.
Regarding claim 21, Darvish discloses:
A deep neural network (DNN) accelerator, the DNN accelerator comprising:
a memory for storing a first extreme exponent for a floating-point row in a first floating- point matrix and a second extreme exponent for a floating-point column in a second floating- point matrix, wherein the row comprises row elements, the first extreme exponent is a highest exponent of exponents of the row elements, the column comprises column elements, and the second extreme exponent is a highest exponent of exponents of the column elements ["the neural network accelerator 180 can quantize the inputs, weights, and activations for a neural network" par.52; "The subgraph accelerator 186 can access a local memory used for storing weights, biases, input values, output values, and so forth" par.40; "only storing one copy of the shared exponent and operating with reduced mantissa widths." par.52; "Performing the matrix multiply AB includes taking dot products of the rows of A with the columns of B. Bounding boxes can be selected to include the rows of A... and bounding boxes can be selected around the columns of B" par.97; "the shared exponent 330 is selected to be the largest exponent from among the original normal-precision numbers in the neural network model 200" par. 72];
one or more first digital circuits configured to: retrieve the first extreme exponent and the second extreme exponent from the memory, transform the floating-point row to a block floating point row including first block floating point numbers based on the first extreme exponent in the memory, and transform the block floating point column to a block floating point column including second block floating point numbers based on the second extreme exponent; [Fig.1, FP to QFP Converter 152; "the subgraph accelerator 186 can be configured using hardwired logic gates of the [180]." par,39; "The normal-precision values 150 are provided to a normal-precision floating-point to quantized floating-point converter 152, which converts the normal-precision value into quantized values" par.50; See par.97 for matrix multiplication; "each number's respective mantissa may be shifted such that the same or a proximate number is represented in the quantized format (e.g. [345 and 346])" par.70; See Fig.3, for converting a normal floating-point format to a quantized block floating-point format based on the extreme/shared exponent, See par.70-74]
Wherein block floating point can be computed entirely with fixed point representations ["for the block floating-point dot product operation 360, the product can be calculated using integer arithmetic to combine mantissa elements... can be done entirely with fixed point or integer representations" par.71]
operations unit configured to:
perform a multiplication operation on the block floating point row and the block floating point column to generate a block floating point product [Fig.1, 154 QFP Operations; "since the exponent portion can be factored in the block floating point representation, multiplication and addition of the mantissas can be done entirely with fixed point or integer representations… reducing computational costs by using more integer arithmetic, instead of floating-point arithmetic." par.71; “Performing the matrix multiply AB includes taking dot products of the rows of A with the columns of B” par.74] and
one or more second digital circuits configured to: after generating the block floating point product, retrieve the first extreme exponent and the second extreme exponent from the memory, and transforming the block floating point product to a floating-point product based on the first extreme exponent and the second extreme exponent. [Fig.1, QFP to FP Converter 156; “The quantized values can then be converted back to a normal-floating-point format using a quantized floating-point to normal-floating-point converter” par. 150; "method 800 can be performed by...such as the neural network accelerator 180" par.94; "Converting from the quantized-precision floating-point format to the normal-precision floating-point format can include generating an exponent value for a normal precision floating-point value and adjusting a mantissa values for the normal-precision floating-point values (such as increasing the number of bits of the mantissa values and/or shifting the mantissa values to account for the generated exponent)" par.100; Fig.3 item 360, shows sum of the two shared/extreme exponents for the final result]
However, Darvish does not explicitly disclose an array of processing elements configured to perform a multiplication operation on the row and the column to generate a product;
In the analogous art of block floating point architectures, Bittner teaches an array of processing elements configured to perform a multiplication operation on the block floating point row and the block floating point column to generate a block floating point product [Fig.4; “Note that the Vector input could also be replaced by a Matrix input to perform Matrix x Matrix multiplication” par.37; "a systolic array matrix multiplier as can be used in certain examples of the disclosed technology… " par.77; See fig.2-3 and 7; “disclosed for block floating-point (BFP) implementations… all elements in a row, in a column, or an entire array can have varying mantissas and share a common exponent” par. 2];
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish and Bittner before him before the effective filing date of the claimed invention to fill in the gap of the operation unit disclosed by Darvish, and implement an parallel array for matrix multiplication operations taught by Bittner, in order to speed up and parallelize operations for matrix multiplication operations [Darvish: par.67 and Bittner: 36, 75-76, 78-80 and 88]
Regarding claim 22, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 21 above.
Darvish discloses wherein the one or more first digital circuits and the one or more second digital circuits are arranged outside the array of processing elements [fig. 1, the converters are separated from the QFP operations].
Regarding claim 23, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 21 above.
Darvish discloses memory used for the block floating point operations ["The subgraph accelerator 186 can access a local memory used for storing weights, biases, input values, output values, and so forth" par.40];
That memory used for invention can use cache memory [“computer-readable media, such as one or more optical media discs, volatile memory components (such as DRAM or SRAM)… as any data created and used during implementation of the disclosed embodiments, can be stored on one or more computer-readable media” par.18; “The memory 1020 may be volatile memory (e.g., registers, cache, RAM)… The memory 1020 stores software 1080, images, and video that can, for example, implement the technologies described herein” par.121]
Bittner also teaches using a cache associated with the array of processing elements [“The memory interface 240 and/or the main memory can include caches ( e.g., n-way or associative caches) to improve memory access performance. In some examples the cache is implemented using static RAM (SRAM) and the main memory 245 is implemented using dynamic RAM (DRAM)” par.69].
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish and Bittner before him before the effective filing date of the claimed invention to use caches for the data as disclosed by Darvish, to improve memory access for any data used and created [Darvish par.121 and Bittner par.69]
Regarding claim 11, Darvish discloses One or more non-transitory computer-readable media storing instructions executable to perform operations for deep learning, the operations ["Any of the disclosed methods can be implemented as computer-executable instructions stored on one or more computer-readable media" par.18; "the subgraph accelerator 186 can be configured and/or executed using instructions executable on the tensor processing unit 182", par.39];
Claim 11’s remaining limitations are directed to claim 21. A mere change in statutory class is obvious. Claim 11 is further rejected for the reasons given in claim 21.
Regarding claim 16, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 11 above.
Darvish discloses scaling the fixed-point product by a scaling factor to generate a new fixed-point product, the scaling factor equal a sum of the first extreme exponent and the second extreme exponent; and transforming the new fixed-point product to the floating-point product. [Fig.1, QFP to FP Converter 156; “The quantized values can then be converted back to a normal-floating-point format using a quantized floating-point to normal-floating-point converter” par. 150; "method 800 can be performed by...such as the neural network accelerator 180" par.94; "Converting from the quantized-precision floating-point format to the normal-precision floating-point format can include generating an exponent value for a normal precision floating-point value and adjusting a mantissa values for the normal-precision floating-point values (such as increasing the number of bits of the mantissa values and/or shifting the mantissa values to account for the generated exponent)" par.100; Fig.3 item 360, shows sum of the two shared/extreme exponents for the product, it would be obvious to use the sum of the two shared/extreme exponents for shifting the mantissa back to floating point]
Regarding claim 18, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 11 above. Darvish discloses wherein the floating- point row is transformed to the fixed-point row by a first digital circuit, the fixed-point product is transformed to the floating-point product a second digital circuit, and the first digital circuit and the second digital circuit are arranged outside the processing unit [Fig.1 ,152 converter, 154 operations unit, 156 converter; Fig.3 item 360, par.97 and 98 for dot product].
Darvish and Bittner discloses array of processing elements, see claim 11 (and 21);
Claim 19 is directed to claim 23. A mere change in statutory class is obvious. Claim 19 is rejected for the reasons given above for claim 23.
Claims 1, 6, and 8-9 are directed to claims 11, 16, and 18-19 respectively. A mere change in statutory class is obvious. Claims 1, 6, and 8-9 are rejected for the reasons given above for claims 11, 16, and 18-19, respectively.
Claims 4-5 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Darvish, and Bittner, and further in view of Barat Quesada (US 2016/0188293 A1), hereinafter Barat.
Regarding claim 14, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 11 above.
Darvish discloses Generating Block floating point for both matrices of the matrix-matrix multiplication based on the maximum exponent stored in memory [“the neural network accelerator 180 can quantize the inputs, weights, and activations for a neural network" par.52; “Bounding boxes can be selected to include the rows of A… and bounding boxes can be selected around the columns of B …” par.97; "only storing one copy of the shared exponent and operating with reduced mantissa widths." par.52; "the shared exponent 330 is selected to be the largest exponent from among the original normal-precision numbers in the neural network model 200" par. 72]
for each respective row element in the floating-point row:
determining a shifting factor based on a number’s exponent and a shared exponent and shifting the numbers to the right to generate the Block floating point [“floating point format numbers 310 in the neural network model 200 are converted to a set of quantized precision, block floating point format numbers…. Each number’s respective mantissa may be shifted such that the same or a proximate number is represented in the quantized format ( e.g., shifted mantissas 345 and 346)” par.70; Figure 3, shows shifting the mantissa to the right]
however, Darvish and Bittner does not explicitly disclose:
for each respective element in the floating-point block:
determining a shifting factor by inputting the first extreme exponent and an exponent of the respective element into a first digital circuit, the first digital circuit outputting a difference between the first extreme exponent and the exponent of the respective element; and
transforming the respective element to one of the first fixed-point numbers by inputting the respective element and shifting factor into a second digital circuit, the second digital circuit performing right shifts on mantissa bits of the respective element based on the shifting factor.
In the analogous art of block floating point architectures, Barat teaches
using an extreme exponent [“By using the maximum-input-exponent-value instead of the input-scale-factor associated with the previous-input-block of data, the method avoids providing an output that is scaled to be either too large or too small to be properly represented by the chosen block floating point representation.” Par.95]
for each respective element in the floating-point block: [“FIG. 5 shows a block diagram depicting a method 500 for converting a block of floating point numbers into a block of fixed point numbers” par.85; “The common exponent and the block of fixed point numbers may be referred to as a 'block floating point' number” par.73]
determining a shifting factor by inputting the extreme exponent and an exponent of the respective element into a first digital circuit, the first digital circuit outputting a difference between the extreme exponent and the exponent of the respective element; [“The alignment may be in accordance with a difference between the exponent 506 and the input-scale-factor 512.” Par. 85; fig. 2, shows an Align-Factor computation unit]
and transforming the respective element of one of the first fixed-point numbers by inputting the respective element and shifting factor into a second digital circuit, the second digital circuit performing right shifts on mantissa bits of the respective element based on the shifting factor [fig. 2, shows an Alignment Shift unit that applies right shifts to the mantissa based on the output of the Align-Factor computation unit].
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish, Bittner, and Barat before him before the effective filing date of the claimed invention to fill in the gap of the conversion unit for aligning and shifting the floating point data disclosed by Darvish, and implement an alignment shift circuitry taught by Barat, in order to generate block floating point data using simplified conversions, improved dynamic range, useability in fixed-point processors, and processing [Barat: Par.1, 6, and 73]
Regarding claim 15, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 11 above.
Darvish discloses Generating Block floating point for both matrices of the matrix-matrix multiplication based on the maximum exponent stored in memory [“the neural network accelerator 180 can quantize the inputs, weights, and activations for a neural network" par.52; “Bounding boxes can be selected to include the rows of A… and bounding boxes can be selected around the columns of B …” par.97; "only storing one copy of the shared exponent and operating with reduced mantissa widths." par.52; "the shared exponent 330 is selected to be the largest exponent from among the original normal-precision numbers in the neural network model 200" par. 72]
for each respective column element in the floating-point column:
determining a shifting factor based on a number’s exponent and a shared exponent and shifting the numbers to the right to generate the Block floating point [“floating point format numbers 310 in the neural network model 200 are converted to a set of quantized precision, block floating point format numbers…. Each number’s respective mantissa may be shifted such that the same or a proximate number is represented in the quantized format ( e.g., shifted mantissas 345 and 346)” par.70; Figure 3, shows shifting the mantissa to the right]
however, Darvish and Bittner does not explicitly disclose:
for each respective element in the floating-point block:
determining a shifting factor by inputting the second extreme exponent and an exponent of the respective element into a first digital circuit, the first digital circuit outputting a difference between the second extreme exponent and the exponent of the respective element; and
transforming the respective element to one of the first fixed-point numbers by inputting the respective element and shifting factor into a second digital circuit, the second digital circuit performing right shifts on mantissa bits of the respective element based on the shifting factor.
In the analogous art of block floating point architectures, Barat teaches
using an extreme exponent [“By using the maximum-input-exponent-value instead of the input-scale-factor associated with the previous-input-block of data, the method avoids providing an output that is scaled to be either too large or too small to be properly represented by the chosen block floating point representation.” Par.95]
for each respective element in the floating-point block: [“FIG. 5 shows a block diagram depicting a method 500 for converting a block of floating point numbers into a block of fixed point numbers” par.85; “The common exponent and the block of fixed point numbers may be referred to as a 'block floating point' number” par.73]
determining a shifting factor by inputting the extreme exponent and an exponent of the respective element into a first digital circuit, the first digital circuit outputting a difference between the extreme exponent and the exponent of the respective element; [“The alignment may be in accordance with a difference between the exponent 506 and the input-scale-factor 512.” Par. 85; fig. 2, shows an Align-Factor computation unit]
and transforming the respective element of one of the first fixed-point numbers by inputting the respective element and shifting factor into a second digital circuit, the second digital circuit performing right shifts on mantissa bits of the respective element based on the shifting factor [fig. 2, shows an Alignment Shift unit that applies right shifts to the mantissa based on the output of the Align-Factor computation unit].
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish, Bittner, and Barat before him before the effective filing date of the claimed invention to fill in the gap of the conversion unit for aligning and shifting the floating point data disclosed by Darvish, and implement an alignment shift circuitry taught by Barat, in order to generate block floating point data using simplified conversions, improved dynamic range, useability in fixed-point processors, and processing [Barat: Par.1, 6, and 73]
Claims 4 and 5 are directed to claims 14 and 15, respectively. A mere change in statutory class is obvious. Claims 4 and 5 are rejected for the reasons given above for claims 14 and 15, respectively.
Claims 7, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Darvish, and Bittner, and further in view of Pareek et al. (US 2020/0089472 A1), hereinafter Pareek.
Regarding claim 17, Darvish, and Bittner disclose the invention substantially as claimed. See the discussion of claim 11 above.
Darvish discloses disclose transforming the block floating point product to an floating-point product [“a tensor operation can be performed using the quantized-precision floating-point format… dot-product…”par.98; “Converting from the quantized-precision floating point format to the normal-precision floating-point format” par.100; see fig. 8/9; see fig.3, 360]
And the shared/extreme exponent is based on the maximum exponent ["the shared exponent 330 is selected to be the largest exponent from among the original normal-precision numbers in the neural network model 200" par. 72]
Darvish, and Bittner does not explicitly disclose scaling the intermediate floating-point product based on the first extreme exponent and the second extreme exponent.
In the analogous art of shared exponent floating point architectures Pareek teaches scaling the floating-point product based on the first extreme exponent and the second extreme exponent [Fig.6, Adders 646 and 648, par.49; “Shared exponents are factored-out from operands that can include a set of weights and input activations” par.19];
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish, Bittner, and Pareek before him before the effective filing date of the claimed invention modify the conversion circuitry disclosed by Darvish, and implement an exponent restoration circuitry taught by Pareek, in order to produce the final accumulated value, and update the exponent value to the correct value based on the product and the shared exponents and allow for more parallel accumulations [Pareek: Par.25, 31-35, and 58]
Claim 7, is directed to claim 17. A mere change in statutory class is obvious. Claim 7 is rejected for the reasons given above for claim 17.
Claims 20 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Darvish, and Bittner, and further in view of Khailany et al. (US 2022/0067530 A1), hereinafter Khailany.
Regarding claim 20, Darvish, and Bittner disclose the invention substantially as claimed. See the discussion of claim 11 above. Darvish, and Bittner does not explicitly disclose accumulating the floating-point product with an additional floating-point product, wherein the additional floating-point product is a result of multiplying a row in a third floating-point matrix with a column of a fourth floating-point matrix.
In the analogous art of block floating point architectures Khailany teaches
The use of block floating point format for dot products [“a collection of dot-products between an unrolled region of weights and an unrolled region of activations, vector-MAC units are the building blocks of many DNN processing architectures. “ par.46; “The per-vector scale factors can be a low-bitwidth integer, floating-point, or power-of-two format value. Note that the power-of-two implementation reverts to block floating-point with a per-element mantissa and an exponent that is shared across the vector.” Par.47]
accumulating the floating-point product with an additional floating-point product, wherein the additional floating-point product is a result of multiplying a row in a third floating-point matrix with a column of a fourth floating-point matrix [“the MMA 365 may include or be replaced with tensor cores, matrix multiply accelerators, or tensor processing units“ par.94; “each tensor core operates on a 4x4 matrix and performs a matrix multiply and accumulate operation D=A'B+C, where A, B, C, and D are 4x4 matrices” par.121; “The 16-bit floating point multiply requires 64 operations and results in a full precision product that is then accumulated using 32-bit floating point addition with the other intermediate products for a 4x4x4 matrix multiply” par.123].
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish, Bittner, and Khailany before him before the effective filing date of the claimed invention modify the tensor cores taught by Khailany, to ultilize the block floating point for matrix multiplication as disclosed by both Darvish, and Khailany, to be incorporated into the accelerator disclosed by Darvish, to allow for further parallelization of the matrix multiply and accumulate operation and [Darvish par.67 and Khailany par.24, and 118-123]
Claim 10 is directed to claim 20. A mere change in statutory class is obvious. Claim 10 is rejected for the reasons given above for claim 20.
Claims 2-3, 12-13, and 24-15 are rejected under 35 U.S.C. 103 as being unpatentable over Darvish, and Bittner, and further in view of Shibayama et al. (US 2012/0117337 A1), hereinafter Shibayama.
Regarding claim 24, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 21 above.
Darvish discloses memory used for the block floating point operations i.e. storing the row elements ["The subgraph accelerator 186 can access a local memory used for storing weights, biases, input values, output values, and so forth" par.40];
That memory used for invention can use cache memory [“computer-readable media, such as one or more optical media discs, volatile memory components (such as DRAM or SRAM)… as any data created and used during implementation of the disclosed embodiments, can be stored on one or more computer-readable media” par.18; “The memory 1020 may be volatile memory (e.g., registers, cache, RAM)… The memory 1020 stores software 1080, images, and video that can, for example, implement the technologies described herein” par.121]
Storing the maximum exponents in memory ["only storing one copy of the shared exponent and operating with reduced mantissa widths." par.52; "the shared exponent 330 is selected to be the largest exponent from among the original normal-precision numbers in the neural network model 200" par. 72]
And Generating Block floating point for both matrices of the matrix-matrix multiplication [“the neural network accelerator 180 can quantize the inputs, weights, and activations for a neural network" par.52; “Bounding boxes can be selected to include the rows of A… and bounding boxes can be selected around the columns of B …” par.97]
Bittner also teaches using a cache associated with the array of processing elements [“The memory interface 240 and/or the main memory can include caches ( e.g., n-way or associative caches) to improve memory access performance. In some examples the cache is implemented using static RAM (SRAM) and the main memory 245 is implemented using dynamic RAM (DRAM)” par.69].
However, Darvish and Bittner does not explicitly disclose:
the DNN accelerator further comprises one or more third digital circuits configured to:
receive a first extreme exponent from the memory;
receive an exponent of a first row element; determine a second extreme exponent, wherein the second extreme is a higher exponent of the first extreme exponent and the exponent of the first row element, and the second extreme exponent is stored in the memory;
receive the second extreme exponent from the memory; and
determine a third extreme exponent, wherein the third extreme exponent is a higher exponent of the second extreme exponent and the exponent of the second row element, and the third extreme exponent is stored in the memory.
In the analogous art of block floating point architectures, Shibayama teaches
Using a digital circuit for finding the maximum exponent, for Block floating point, for each element of a block, receive an exponent of the element; determine a output exponent, wherein the output exponent is a higher exponent of a first exponent and the exponent of the element, and the output exponent is stored in the memory, [Fig.2, 100; “all data in a block are sequentially input to the maximum exponent calculator 100 based on an input clock signal, and the maximum exponent calculator 100 calculates the maximum exponent of all data in the block” par. 57; “In the block floating point operations, a plurality of signal data are grouped into one block, and normalization is performed for the entire block to have a common exponent in each block” par.3];
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish, Bittner, and Shibayama before him before the effective filing date of the claimed invention to fill in the gap of the conversion unit for finding the maximum disclosed by Darvish, and implement an maximum circuitry for Block floating point as taught by Shibayama, that allows to find the maximum exponent at high speed, reduced logic and handling for “negative” exponents [Shibayama: Par.83-85, 102, 123-128, 143-149]
Regarding claim 25, Darvish and Bittner disclose the invention substantially as claimed. See the discussion of claim 21 above.
Darvish discloses memory used for the block floating point operations i.e. storing the row elements ["The subgraph accelerator 186 can access a local memory used for storing weights, biases, input values, output values, and so forth" par.40];
Storing exponents in memory ["only storing one copy of the shared exponent and operating with reduced mantissa widths." par.52];
And Generating Block floating point for both matrices of the matrix-matrix multiplication [“the neural network accelerator 180 can quantize the inputs, weights, and activations for a neural network" par.52; “Bounding boxes can be selected to include the rows of A… and bounding boxes can be selected around the columns of B …” par.97]
However, Darvish and Bittner does not explicitly disclose:
wherein the DNN accelerator further comprises a plurality of third digital circuits, each of which is configured to:
receive a different one of the column elements in the column; and
selecting a higher exponent of an exponent stored in the memory and an exponent of the different one of the column elements, wherein the higher exponent is stored in the memory.
In the analogous art of block floating point architectures, Shibayama teaches
A plurality of digital circuits for finding the maximum exponent, for Block floating point, for each of the digital circuits receive an exponent of the element of a block; determine a output exponent, wherein the output exponent is a higher exponent of a first exponent and the exponent of the element, and the output exponent is stored in the memory, [Fig.7, 150 and/or Fig.9, 160; “The maximum exponent calculator 150 is a circuit that calculates the maximum exponent of all data in a block which is composed of a plurality of input data.” par. 104; “In the block floating point operations, a plurality of signal data are grouped into one block, and normalization is performed for the entire block to have a common exponent in each block” par.3; “it can be implemented by a circuit having a simple configuration with a single logical stage composed of a plurality of XOR circuits” par.123];
It would have been obvious to one of ordinary skill in the art, having the teachings of Darvish, Bittner, and Shibayama before him before the effective filing date of the claimed invention to fill in the gap of the conversion unit for finding the maximum disclosed by Darvish, and implement an maximum circuitry for Block floating point as taught by Shibayama, that allows to find the maximum exponent at high speed, reduced logic and handling for “negative” exponents [Shibayama: Par.83-85, 102, 123-128, 143-149]
Claims 12 and 13 is directed to claims 24 and 25, respectively. A mere change in statutory class is obvious. Claims 12 and 13 are rejected for the reasons given above for claims 24 and 25, respectively.
Claims 2 and 3 is directed to claims 24 and 25, respectively. A mere change in statutory class is obvious. Claims 2 and 3 are rejected for the reasons given above for claims 24 and 25, respectively.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Naous et al. (US 2023/0133360 A1) discloses quantizer including a max and shift unit and scaling adjustments. See figure 2
Lo et al. (US 2019/0347072 A1) discloses block floating point. See figure 1-2
Burger (US 2019/0057303 A1) discloses block floating point units with multifunction float units and various shifting operations. See figure 7 and 8 and tables 3-5
Mellempudi et al. (US 2018/0322607 A1) discloses dynamic fixed point format using shared exponents and shift logic with metadata. See figures 15, 19, and 20, and paragraphs 188-212
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kenny K. Bui whose telephone number is (571)270-0604. The examiner can normally be reached 8:00 am to 3:00 pm on Monday, 8:00 am to 4:00 pm on Tuesday to Friday ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew T Caldwell can be reached at (571)272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KENNY K. BUI/Patent Examiner, Art Unit 2182 (571)270-0604
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182