DETAILED ACTION
The Office Action is sent in response to Applicant’s Communication received on 12/08/2025 for application number 17/745,512. The Office hereby acknowledges receipt of the following and placed of record in file: Applicant’s Remarks, and Amendments to Clams and Specification
Examiner Notes the following: claims 1-9, 13-17, and 20-32 have been amended.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 11 and 28 objected to because of the following informalities:
Claim 11, “an application programming interface (API)” should read as “the application programming interface (API)”.
Claim 28, “an application programming interface (API)” should read as “the application programming interface (API)”.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-32 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Under the Alice Framework Step 1, claims 1- 8 recites a processor and, therefore, is a machine.
Under the Alice Framework Step 2A prong 1, claim 1 recites
One or more processors, comprising:
Circuitry to:
In response to an application programming interface (API) call to perform a matrix multiply accumulate (MMA) operation:
cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers based, at least in part, on one or more rounding attributes indicated as parameters to the API; and
perform the MMA operation based, at least in part, on the one or more rounded TF32 numbers.
The above underlined limitations are related to converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations which falls under the “Mathematical Concepts” of abstract ideas (see specification paragraphs: 98 and 115, for number format; 174-175, for the code and operation; and 56-59 and 75-80 for instructions of the API and rounding modes). See Gottschalk v. Benson, 409 U.S. 63, 70, 175 USPQ 673, 676 (1972), the conversion of binary-coded-decimal (BCD) numerals into pure binary numbers is directed to “Mathematical Concepts” of abstract ideas. As such, the concept of converting from one data format into another (i.e. one floating point format into another floating point format) is directed to “Mathematical Concepts” of abstract ideas. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: Circuitry and “in response to an application programming interface (API) call”. However, the additional elements of Circuitry and an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic computer component for converting data and performing matrix multiplication; and as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. Additionally, the additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of Circuitry and an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic computer component for converting data and performing matrix multiplication; and as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. Kirk et al. (NPL: “Programming Massively Parallel processors: A Hands-on Approach 2nd Ed.”) discloses a API call are commands from the API’s collection of functions to allow programmers to manage parallelism and allows applications to run correctly on all processors that supports the API. See p.15, 24, 25, 33, and 48. Gaster et al. (NPL: “Heterogeneous Computing with OpenCL”) discloses OpenCL API which can run on significantly different architectures to execute functions. See chapters 1-2. Cooksey (NPL: “An Introduction to APIs”) discloses API is a program to run functions/tasks on computers. See Chapter 1. Nvidia (NPL: “CUDA C++ Programming Guide”) discloses CUDA API, CUDA API calls, and libraries. See Chapter 3, p.15, 46, and 63. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 1, claims 2-8 recite further steps and details converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations and falls within the “mathematical Concepts” and/or “mental Processes” grouping of abstract ideas.
For Claim 2, it is directed to using the converted data in matrix multiply and accumulate operations. In particular claim 2 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
For Claim 3, it is directed to executing instructions on a processor. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, claim 3 recites the following additional elements of “MMA operations to be performed by a group of threads all performing a same instruction”. However, the additional element of “…performed by a group of threads…” is recited at a high-level of generality (i.e., as a generic computer component for executing instructions (i.e. a GPU) ) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2B, claim 3 does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “…performed by a group of threads…” is recited at a high-level of generality (i.e., as a generic computer component for executing instructions (i.e. a GPU) ) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claims 4, it is directed to executing instructions to perform one of the rounding methods. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of one or more instruction parameters. However, the additional elements of one or more instruction parameters is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of one or more instruction parameters is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claims 5, it is directed to executing instructions to perform one of the rounding methods. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of one or more application programming interface (API) parameters. However, the additional elements of one or more application programming interface (API) parameters is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, claim 5 does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of one or more application programming interface (API) parameters is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claims 6, it is directed to executing instructions to perform one of the rounding methods. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of one or more platform independent instruction parameters. However, the additional elements of one or more platform independent instruction parameters is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of one or more platform independent instruction parameters is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 7, it is directed to applying a rounding mode i.e. truncation operation on the data format. In particular claim 7 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
For claim 8, it is directed to using matrix engines to perform one or more matrix operations based on the data format. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of one or more hardware matrix engines. However, the additional elements of one or more hardware matrix engines is recited at a high-level of generality (i.e., as a generic computer component for performing matrix operations ) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of one or more hardware matrix engines is recited at a high-level of generality (i.e., as a generic computer component for performing matrix operations ) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 1, claims 9-16 recite a method and, therefore, is a process.
Under the Alice Framework Step 2A prong 1, claim 1 recites
A method, comprising:
In response to an application programming interface (API) call to perform a matrix multiply-accumulate (MMA) operation:
causing one or more thirty-two bit floating point (FP32) numbers to be converted to generate one or more rounded thirty-two bit TensorFloat (TF32) numbers based, at least in part, on one or more rounding attributes indicated as parameters to the API; and
performing the MMA operation based, at least in part, on the one or more rounded TF32 numbers.
The above underlined limitations are related to converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations which falls under the “Mathematical Concepts” of abstract ideas (see specification paragraphs: 98 and 115, for number format; 174-175, for the code and operation; and 56-59 and 75-80 for instructions of the API and rounding modes). See Gottschalk v. Benson, 409 U.S. 63, 70, 175 USPQ 673, 676 (1972), the conversion of binary-coded-decimal (BCD) numerals into pure binary numbers is directed to “Mathematical Concepts” of abstract ideas. As such, the concept of converting from one data format into another (i.e. one floating point format into another floating point format) is directed to “Mathematical Concepts” of abstract ideas. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “in response to an application programming interface (API) call”. However, the additional element of an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. Additionally, the additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. Kirk et al. (NPL: “Programming Massively Parallel processors: A Hands-on Approach 2nd Ed.”) discloses a API call are commands from the API’s collection of functions to allow programmers to manage parallelism and allows applications to run correctly on all processors that supports the API. See p.15, 24, 25, 33, and 48. Gaster et al. (NPL: “Heterogeneous Computing with OpenCL”) discloses OpenCL API which can run on significantly different architectures to execute functions. See chapters 1-2. Cooksey (NPL: “An Introduction to APIs”) discloses API is a program to run functions/tasks on computers. See Chapter 1. Nvidia (NPL: “CUDA C++ Programming Guide”) discloses CUDA API, CUDA API calls, and libraries. See Chapter 3, p.15, 46, and 63. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 1, claims 10-16 recite further steps and details to converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations and falls within the “mathematical Concepts” and/or “mental Processes” grouping of abstract ideas.
For claim 10, it is directed to performing matrix operations using the converted data. In particular claim 10 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the recites an abstract idea.
For claim 11, it is directed to executing instructions to perform one of the rounding methods and an operation. In particular the claim does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 12, it is directed to executing instructions to perform one of the rounding methods and execute a matrix operation. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of a set of instructions. However, the additional elements of a set of instructions is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of an set of instructions is recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 13, it is detailing the mathematical operation of truncating the data format into another data format. In particular claim 13 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
For claim 14, it is directed to performing truncation. In particular claim 14 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 15, it is directed to executing instructions to perform the conversion/truncation of the data format. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of one or more human-readable instructions and one or more machine instruction. However, the additional elements of one or more human-readable instructions and one or more machine instruction are recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of one or more human-readable instructions and one or more machine instruction are recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 16, it is directed to executing a set of instructions to perform matrix operations on the converted data format. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of a software program and a set of threads perform[ed]… on a graphics processing unit. However, the additional elements of a software program and a set of threads perform[ed]… on a graphics processing unit are recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed; and as a generic computer component for executing the set of instructions) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of a software program and a set of threads perform[ed]… on a graphics processing unit are recited at a high-level of generality (i.e., as a generic computer component for a set of instructions to be executed; and as a generic computer component for executing the set of instructions) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 1,claims 17-24 recites a system and, therefore, is a machine.
Under the Alice Framework Step 2A prong 1, claim 17 recites
A system, comprising:
one or more processors to in response to an application programming interface call to perform a matrix multiply-accumulate (MMA) operation:
cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers based, at least in part, on one or more rounding attributes indicated as parameters to the API; and
perform the MMA operation based, at least in part, on the one or more rounded TF32 numbers.
The above underlined limitations are related to converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations which falls under the “Mathematical Concepts” of abstract ideas (see specification paragraphs: 98 and 115, for number format; 174-175, for the code and operation; and 56-59 and 75-80 for instructions of the API and rounding modes). See Gottschalk v. Benson, 409 U.S. 63, 70, 175 USPQ 673, 676 (1972), the conversion of binary-coded-decimal (BCD) numerals into pure binary numbers is directed to “Mathematical Concepts” of abstract ideas. As such, the concept of converting from one data format into another (i.e. one floating point format into another floating point format) is directed to “Mathematical Concepts” of abstract ideas. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: one or more processors and “in response to an application programming interface (API) call”. However, the additional elements of one or more processors and an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic computer component for converting data and performing matrix multiplication; and as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. Additionally, the additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of one or more processors and an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic computer component for converting data and performing matrix multiplication; and as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. Kirk et al. (NPL: “Programming Massively Parallel processors: A Hands-on Approach 2nd Ed.”) discloses a API call are commands from the API’s collection of functions to allow programmers to manage parallelism and allows applications to run correctly on all processors that supports the API. See p.15, 24, 25, 33, and 48. Gaster et al. (NPL: “Heterogeneous Computing with OpenCL”) discloses OpenCL API which can run on significantly different architectures to execute functions. See chapters 1-2. Cooksey (NPL: “An Introduction to APIs”) discloses API is a program to run functions/tasks on computers. See Chapter 1. Nvidia (NPL: “CUDA C++ Programming Guide”) discloses CUDA API, CUDA API calls, and libraries. See Chapter 3, p.15, 46, and 63. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 1, claims 18-24 recite further steps and details to converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations and falls within the “mathematical Concepts” and/or “mental Processes” grouping of abstract ideas.
For claim 18, it is directed to executing matrix multiply and accumulate operations using the converted data. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of one or more other processors. However, the additional elements of one or more other processors are recited at a high-level of generality (i.e., as a generic computer component for computing matrix multiply and accumulate operations) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of one or more other processors are recited at a high-level of generality (i.e., as a generic computer component for computing matrix multiply and accumulate operations) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 19, is directed to converting another data format into the converted data format. In particular claim 19 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
For claims 20 and 21, are directed to converting the data based on the rounding type/attribute which is part of the math. In particular claims 20 and 21 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
For claim 22, it is directed to converting the input data. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of to be converted automatically. However, the additional elements of to be converted automatically are recited at a high-level of generality (i.e., as a generic computer component for converting the data in for example a microprocessor) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, claim 22 does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of to be converted automatically are recited at a high-level of generality (i.e., as a generic computer component for converting the data in for example a microprocessor) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 23, it is directed to how converting is performed on the data. In particular claim 23 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
Claim 24, is directed to using the converted data in matrix operations. In particular claim 24 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 1, claims 25-32 recites a machine-readable medium and, therefore, is an article of manufacture.
Under the Alice Framework Step 2A prong 1, claim 25 recites
A non-transitory machine-readable medium having stored thereon instructions that, if performed by one or more processors, cause the one or more processors to:
In response to an application programming interface (API) call to perform a matrix multiply-accumulate (MMA) operation:
cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers based, at least in part, on one or more rounding attributes indicated as parameters to the API; and
perform the MMA operation based, at least in part, on the one or more rounded TF32 numbers.
The above underlined limitations are related to converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations which falls under the “Mathematical Concepts” of abstract ideas (see specification paragraphs: 98 and 115, for number format; 174-175, for the code and operation; and 56-59 and 75-80 for instructions of the API and rounding modes). See Gottschalk v. Benson, 409 U.S. 63, 70, 175 USPQ 673, 676 (1972), the conversion of binary-coded-decimal (BCD) numerals into pure binary numbers is directed to “Mathematical Concepts” of abstract ideas. As such, the concept of converting from one data format into another (i.e. one floating point format into another floating point format) is directed to “Mathematical Concepts” of abstract ideas. Accordingly, the claim recites an abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements: “non-transitory machine-readable medium having stored thereon instructions”, “one or more processors” and “in response to an application programming interface (API) call”. However, the additional elements of a non-transitory machine-readable medium having stored thereon instructions, one or more processors and an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic computer component for storing instructions; as a generic computer component for converting data and performing matrix multiplication; and as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. Additionally, the additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claim is not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of a non-transitory machine-readable medium having stored thereon instructions, one or more processors and an application programming interface (API) call is recited at a high-level of generality (i.e., as a generic computer component for storing instructions; as a generic computer component for converting data and performing matrix multiplication; and as a generic set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional element of “in response to an application programming interface (API) call” is merely adding insignificant extra-solution activities. Kirk et al. (NPL: “Programming Massively Parallel processors: A Hands-on Approach 2nd Ed.”) discloses a API call are commands from the API’s collection of functions to allow programmers to manage parallelism and allows applications to run correctly on all processors that supports the API. See p.15, 24, 25, 33, and 48. Gaster et al. (NPL: “Heterogeneous Computing with OpenCL”) discloses OpenCL API which can run on significantly different architectures to execute functions. See chapters 1-2. Cooksey (NPL: “An Introduction to APIs”) discloses API is a program to run functions/tasks on computers. See Chapter 1. Nvidia (NPL: “CUDA C++ Programming Guide”) discloses CUDA API, CUDA API calls, and libraries. See Chapter 3, p.15, 46, and 63. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 1, Claims 26-32 recite further steps and details to converting one data format into another data format to be used in mathematical operations which amount to mathematical relationships and calculations and falls within the “mathematical Concepts” and/or “mental Processes” grouping of abstract ideas.
For claim 26, it is directed to execute matrix multiply and accumulate operations using the converted data on the processors in claim 25. In particular claim 26 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
For claim 27, it is directed to executing matrix multiply and accumulate operations on converted data by executing instructions on matrix engines. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, the claim recites the following additional elements of multiple groups of threads and matrix engine. However, the additional elements of multiple groups of threads and matrix engine are recited at a high-level of generality (i.e., as a generic computer component for instructions to be executed; and as a generic computer component for executing instructions) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, the claim does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of multiple groups of threads and matrix engine are recited at a high-level of generality (i.e., as a generic computer component for instructions to be executed; and as a generic computer component for executing instructions) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 28, it is directed to executing instructions to perform a matrix multiply and accumulate operation on normal data. In particular the claim does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claim 29, it is directed to executing instructions to perform one of the rounding methods. In particular claim 29 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claim recites an abstract idea.
For claim 30, it is directed to compiling the instructions to convert the data. Accordingly, the claim does not amount to significantly more than the abstract idea.
Under the Alice Framework Step 2A prong 2, claim 30 recites the following additional elements of compile code to generate machine instructions. However, the additional elements of compile code to generate machine instructions is recited at a high-level of generality (i.e., as a generic computer component for generating a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The additional elements do not, individually or in combination, integrate the exception into a practical application. Accordingly, the claims are not integrated into a practical application.
Under the Alice Framework Step 2B, claim 27 does not include additional elements that individually or in combination, are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of compile code to generate machine instructions is recited at a high-level of generality (i.e., as a generic computer component for generating a set of instructions to be executed) such that they amount to no more than mere instructions using a generic computer component or merely as tools to implement the abstract idea. The claim does not recite additional elements that alone or in combination amount to an inventive concept. Accordingly, the claim does not amount to significantly more than the abstract idea.
For claims 31 and 32, they are directed to converting one data format into the converted data format. In particular claims 31 and 32 does not include additional elements that would require further analysis under Step 2A prong 2 and Step 2B. Accordingly, the claims recite an abstract idea.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 8-9, 11-12, 14-17, 19-23, 25-32 are rejected under 35 U.S.C. 103 as being unpatentable over Nvidia (NPL: “CUDA C++ Programming Guide”), and in view of Ould-Ahmed-Vall et al. (US 2018/0315159 A1), hereinafter Ould, and further in view of IEEE computer Society (NPL: “IEEE Standard for Floating-Point Arithmetic” in the IDS filed 11/10/2022), hereinafter IEEE2008, and further in view of Pasca et al. (US 2019/0042193 A1), hereinafter Pasca.
Regarding claim 1, Nvidia discloses
One or more processors comprising:
Circuitry [“The NVIDIA GPU architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs)”, p.104] to:
in response to an application programming interface (API) call [“CUDA driver API, which is also accessible by the application. The driver API provides … and CUDA modules - the analogue of dynamically loaded libraries for the device… Kernels can be written using the CUDA instruction set architecture, called PTX,” p.15; “The driver API must be initialized… before any function from the driver API is called” Appendix I. Driver API, p.297; See 3.2.12.1]
to perform a matrix multiply-accumulate (MMA) operation [“The core language extensions… allow programmers to define a kernel as a C++ function… the runtime is introduced in CUDA Runtime. It provides C and C++ functions… the runtime is built on top of… the CUDA driver API…” P.15; See Appendix B. C++ Language Extensions, Sec.B.17 Warp Matrix Functions, “C++ warp matrix operations leverage Tensor Cores to accelerate matrix problems of the form D=A*B+C… All following functions and types are defined in the namespace nvcuda::wmma. Sub-byte operations are considered preview, i.e. the data structures and APIs…”];
perform type conversion operations on operands [See Table on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
and IEEE-compliant rounding functions [“multiplication are IEEE-compliant” Sec. E.1. Standard Functions, p.214; Sec. E.2. Intrinsic Functions, p.222-224 shows functions with 4 types of rounding modes that are IEEE-compliant].
However, Nvidia does not explicitly disclose:
cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers based, at least in part, on one or more rounding attributes indicated as parameters to the API; and perform the MMA operation based, at least in part, on the one or more rounded TF32 numbers.
In the analogous art of matrix multiplication architectures, Ould teaches:
Performing MMA operations based on one or more application programming interface (API) parameters [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "driver software for the graphics processor translates API calls that are specific to a particular graphics or media library into commands that can be processed by the graphics processor" par. 290].
circuitry to cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more floating point numbers [“The pipeline manager 232 receives instructions from the scheduler 210 of FIG. 2A and manages execution of those instructions via a graphics multiprocessor 234” par. 67; “234 has…instruction cache 252, an instruction unit 254… one or more general purpose graphics processing unit (GPGPU) cores 262” par. 74; "The instruction unit 254 can dispatch instructions as thread groups (e.g., warps), with each thread of the thread group assigned to a different execution unit within GPGPU core 262." par. 75; GPGPU cores have Floating point units and can execute instructions for SIMD or SIMT, par. 77-78; "FIG. 14 illustrates components of a dynamic precision floating point unit 1400" par. 189; " extend this capability by providing support for instruction and associated logic to enable variable precision operations… a set of instructions and associated logic is provided in which throughput is increased by performing floating point operations at the lowest precision possible without significant loss" par.188; "The logic 1700 can then perform the numerical operation using a number of bits associated with a second precision that is lower than the first precision, as shown at block 1704." par. 205; "1400 can attempt to perform FP32 operations at FP16 precision, while power gating elements and components beyond those required to perform operations at FP16 precision." par. 195].
Using IEEE standard numbers and operations. [“The IEEE 754 single-precision binary floating point format specifies a 32-bit binary representation having a 1-bit sign, an 8-bit exponent, and a 24-bit significand, of which 23 bits are explicitly stored. The IEEE 754 half-precision binary floating point format specifies a 16-bit binary representation having a 1-bit sign, 5-bit exponent, and 11-bit significand, of which 10-bits are explicitly stored… Floating point units capable of performing arithmetic operations at single and half precision are known in the art” par. 187; “the FPUs can implement the IEEE 754-2008 standard for floating point arithmetic or enable variable precision floating point arithmetic” par. 77]
And performing multiplication using FP32 data with only 10 bit precision (at TF32) [“Where FP32 operations can be performed at FP16 precision“ par. 195; “For a 32-bit floating point value or a 32-bit integer value, the multiplier 1506 can perform a multiplication operation for a 24-bit significand at 11-bits (e.g., FP16 precision)” par. 198]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia and Ould before him before the effective filing date of the claimed invention to incorporate the matrix multiplication device as taught by Ould to perform the matrix multiplication operations disclosed by Nvidia, in order to, implement an SIMT/SIMD capable matrix multiplication device that improves parallelism, efficiently process commands, increase throughput, and reducing power demand by performing FP32 operations at a lower precision [par.42, 60, 78, 85, 195-199, and 219]
However, Nvidia and Ould does not explicitly teach one or more circuits to cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers based, at least in part, on one or more rounding attributes.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches rounding is based on one or more rounding attributes ["attribute:… the term attribute might refer to the parameter (as in “rounding-direction attribute”) or its value (as in “roundTowardZero attribute”)" 2. Definitions, p.3; "An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "Rounding takes a number regarded as infinitely precise and, if necessary, modifies it to fit in the destination’s format" 4.3 Rounding-direction attributes]
Furthermore, Ould teaches extending the capability of floating point operations as “embodiments described herein extend this capability by providing support for instruction and associated logic to enable variable precision operations” on paragraph 188.
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling]
However, Nvidia, Ould and IEEE2008 does not explicitly disclose one or more circuits to cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers.
In the analogous art of Floating-Point data format conversion and operations, Pasca teaches cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers. [“the inputs and outputs may take any suitable format” par. 23; FP16+++: “to account for potential overflow, the multiplication results may be formatted with a 1-bit sign field, an 8-bit exponent field, and a 10-bit mantissa field.” Par. 24; “process 140 includes scaling a set of original inputs to the DSP circuitry 60 from a first format to a second format (process block 142)” par. 47; “may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62” par. 43]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, and IEEE2008, to look for an implementation of the rounding methods disclosed by IEEE2008. As such, it would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, and Pasca before him before the effective filing date of the claimed invention to incorporate the rounding method and the FP16+++ data format as taught by Pasca into the operations, instructions and datatypes disclosed the combination of Nvidia, Ould, and IEEE2008, to order to implement one of the rounding modes (i.e. truncation) and to increase range of FP16 format to reduce overflow and more different types of formats to convert from [Pasca: par. 23-25, 37-38, 43, 54, 57, and 59]. The combination of Nvidia, Ould, IEEE2008, and Pasca discloses one or more circuits to cause one or more thirty-two bit floating point (FP32) numbers to be converted to one or more rounded thirty-two bit TensorFloat (TF32) numbers based, at least in part, on one or more rounding attributes.
Regarding claim 2, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 1 above.
Ould further teaches wherein the circuitry is further to cause the one or more rounded FP32 numbers to be used in one or more operands of one or more matrix multiply and accumulate (MMA) operations ["The logic 1700 can then perform the numerical operation using a number of bits associated with a second precision that is lower than the first precision, as shown at block 1704." par. 205, hereinafter Logic 1700; GPGPU cores 262 have Floating point units, par. 77; "The internal registers 1404 includes a set of operand registers 1414 that store input values for the dynamic precision floating point unit 1400... to support fused multiply-add, multiply-subtract, multiply-accumulate, or related operations." par. 192; "the floating-point operation is a two-dimensional matrix multiply and accumulate operation" par. 319] for the reasons as stated above in at least claim 1.
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
The combination of Nvidia, Ould, IEEE2008, and Pasca discloses the additional limitations of claim in question.
Regarding claim 3, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 1 above.
Ould teaches wherein the one or more circuits are further to cause the one or more rounded FP32 numbers to be used in one or more operands of one or more matrix multiply and accumulate (MMA) operations to be performed by a group of threads all performing a same instruction [instruction unit 254 can dispatch instructions as warps for GPGPU cores 262, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319] for the reasons as stated above in at least claim 1.
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
The combination of Nvidia, Ould, IEEE2008, and Pasca discloses the additional limitations of claim in question.
Regarding claim 4, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 1 above.
Nvidia discloses the 4 IEEE rounding modes functions [See pages 222-223].
Ould teaches wherein the one or more circuits are to round the one or more FP32 numbers based, at least in part, on one or more instruction parameters [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319] for the reasons as stated above in at least claim 1.
However Nvidia, and Ould does not explicitly disclose wherein the one or more circuits are to select a method of rounding the one or more FP32 numbers based, at least in part, on one or more instruction parameters that indicate the one or more rounding attributes.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches select a method of rounding the one or more FP32 numbers based, at least in part, on one or more instruction parameters that indicate the one or more rounding attributes. ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling]
Regarding claim 5, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 1 above.
Nvidia discloses the use of API for functions operations [“CUDA driver API, which is also accessible by the application. The driver API provides … and CUDA modules - the analogue of dynamically loaded libraries for the device… Kernels can be written using the CUDA instruction set architecture, called PTX,” p.15; “The driver API must be initialized… before any function from the driver API is called” Appendix I. Driver API, p.297; See 3.2.12.1]
Ould teaches wherein the one or more circuits are to round the one or more FP32 numbers based, at least in part, on one or more application programming interface (API) parameters [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "driver software for the graphics processor translates API calls that are specific to a particular graphics or media library into commands that can be processed by the graphics processor" par. 290] for the reasons as stated above in at least claim 1.
However Nvidia and Ould does not explicitly disclose wherein the one or more circuits are to select a method of rounding the one or more FP32 numbers based, at least in part, on one or more application programming interface (API) parameters that indicate the one or more rounding attributes.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches select a method of rounding the one or more FP32 numbers based, at least in part, on one or more parameters that indicate the one or more rounding attributes. ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling].
Regarding claim 6, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 1 above.
Ould teaches wherein the one or more circuits are to round the one or more FP32 numbers based, at least in part, on one or more platform independent instruction parameters [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "instruction set 2309 may facilitate Complex Instruction Set Computing (CISC), Reduced Instruction Set Computing (RISC), or computing via a Very Long Instruction Word (VLIW)" par. 232; par.290 for API] for the reasons as stated above in at least claim 1.
However, Nvidia and Ould does not explicitly disclose wherein the one or more circuits are select a method of rounding the one or more FP32 numbers based, at least in part, on one or more platform independent instruction parameters that indicate the one or more rounding attributes.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches select a method of rounding the one or more FP32 numbers based, at least in part, on one or more parameters that indicate the one or more rounding attributes. ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling]
Regarding claim 8, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 1 above.
Nvidia teaches matrix engines [“The NVIDIA GPU architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs)”, p.104] and matrix operations [Sec.B.17 Warp Matrix Functions]
Ould teaches wherein the one or more circuits are to further cause one or more hardware matrix engines to perform one or more matrix operations based, at least in part, on the one or more rounded TF32 numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319] for the reasons as stated above in at least claim 1.
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
The combination of Nvidia, Ould, IEEE2008, and Pasca discloses the additional limitations of claim in question.
Claim 9 is directed to claim 1. A mere change in statutory class is obvious. As such, claim 9 is rejected for the same reasons stated in claim 1.
Regarding claim 11, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 9 above.
Nvidia discloses API function calls [“CUDA driver API, which is also accessible by the application. The driver API provides … and CUDA modules - the analogue of dynamically loaded libraries for the device… Kernels can be written using the CUDA instruction set architecture, called PTX,” p.15; “The driver API must be initialized… before any function from the driver API is called” Appendix I. Driver API, p.297; See 3.2.12.1]
Ould also teaches wherein an application programming interface (API) call to perform an operation [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "driver software for the graphics processor translates API calls that are specific to a particular graphics or media library into commands that can be processed by the graphics processor" par. 290].
However, Nvidia and Ould does not explicitly disclose wherein the one or more rounding attributes are specified as an input parameter to an application programming interface (API) call to perform an operation.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches wherein the one or more rounding attributes are specified as an input parameter to a program ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling]
Regarding claim 12, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 9 above.
Ould teaches wherein a set of instructions to perform a matrix operation [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319].
However, Nvidia and Ould does not explicitly disclose wherein the one or more rounding attributes are specified in a set of instructions.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches wherein the one or more rounding attributes are specified in a set of instructions ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling]
Regarding claim 14, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 9 above.
Nvidia, Ould, and Pasca does not explicitly disclose wherein the one or more rounding attributes indicate a selection of a method of performing truncation.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches wherein the one or more rounding attributes indicate a selection of a method of performing truncation. ["user-selectable rounding-direction attributes are defined... roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result." 4.3.2 Directed rounding attributes, teaches ]
Wherein Pasca teaches truncation methods [“may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62” par. 43].
It would have been obvious to one of ordinary skill in the art, given the reasons in claim 1 to combine Nvidia, Ould, IEEE2008, and Pasca.
Regarding claim 15, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 9 above.
Nvidia further discloses perform type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould also teaches wherein causing the one or more one or more FP32 numbers to be converted comprises generating, based at least in part on one or more human-readable instructions, one or more machine instructions to perform truncation of the one or more FP32 numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "The SIMD instructions for the GPGPU cores can be generated at compile time by a shader compiler or automatically generated when executing programs written and compiled for single program multiple data (SPMD) or SIMT architectures." par. 78].
Pasca also teaches wherein causing the one or more one or more numbers to be converted ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
Regarding claim 16, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 9 above.
Nvidia further discloses Warp matrix operation and processing using threads [“The multiprocessor creates, manages, schedules, and executes threads in groups of 32 parallel threads called warps” 104; See Sec. 4.1. SIMT Architecture, p.104-106]
Ould also teaches wherein causing the one or more FP32 numbers to be converted comprises generating a software program to cause a set of threads to perform one or more matrix operations on a graphics processing unit using the one or more rounded TF32 numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "The SIMD instructions for the GPGPU cores can be generated at compile time by a shader compiler or automatically generated when executing programs written and compiled for single program multiple data (SPMD) or SIMT architectures." par. 78].
Pasca also teaches wherein causing the one or more one or more numbers to be converted ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
Claim 17 is directed to claim 1. A mere change in statutory class is obvious. As such, claim 17 is rejected for the same reasons as claim 1.
Regarding claim 19, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 17 above.
Nvidia discloses performing type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould teaches the use of higher precision datatypes to generate one or more other lower precision datatypes numbers [“performing operations for higher precision ( e.g., FP32, INT32) operations at lower precision ( e.g., FP16, INT16, INT8).”]
However, Nvidia, Ould, and IEEE2008 does not explicitly disclose wherein the one or more processors are further to change one or more numbers of one or more datatypes different from FP32 to generate one or more other TF32 numbers.
In the analogous art of Floating-Point data format conversion and operations, Pasca teaches converting any suitable number format to any other suitable number format (such as FT32) [“Accordingly, bfloat16 inputs may be scaled to half-precision floating-point, extended precision inputs may be scaled to single-precision floating-point, among other combinations” par. 27; “a result in another number format ( e.g., half-precision floating point, FP16+++, an extended precision and/or the like), which may depend on the format of the set of inputs to the DSP circuitry 60, the format used to initially sum the multiplication results, and/or the like” par. 25; “to adjust (e.g., scale) a variable before and after processing such that operations performed on the variable in a first number format may be emulated by operations performed in another number format using circuitry elements of an integrated circuit” par. 1; ”Integrated circuits may represent variables according to a number of different formats. For example, a variable may be represented in single-precision floating-point format, half-precision floating-point format, bfloat16 format, and/or the like.” Par. 3]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, and Pasca before him before the effective filing date of the claimed invention to incorporate the rounding method for various datatypes and the FP16+++ data format as taught by Pasca into the operations, instructions and datatypes disclosed the combination of Nvidia, Ould, and IEEE2008, in order to implement one of the rounding modes (i.e. truncation) and to increase range of FP16 format to reduce overflow and more different types of formats to convert from [Pasca: par. 23-25, 37-38, 43, 54, 57, and 59].
Regarding claim 20, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 17 above.
Nvidia discloses performing type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould also teaches wherein the one or more processors are to cause the one or more FP32 values to be converted [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319]
Pasca also teaches truncation of values ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
However, Nvidia, Ould and Pasca does not explicitly disclose wherein the one or more processors are to cause the one or more FP32 values to be converted using a method indicated by the one or more rounding attributes.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches values to be rounded using a method indicated by the one or more rounding attributes ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, Pasca, and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, wherein the instructions includes the attributes to identify the rounding done in Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling].
Regarding claim 21, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 17 above.
Nvidia discloses performing type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould teaches wherein one or more FP32 values to be converted [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319]
Pasca also teaches truncation of values ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
However, Nvidia, Ould and Pasca does not explicitly disclose wherein the one or more processors are to cause the one or more FP32 values to be converted using a method indicated by the one or more rounding attributes.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches values to be rounded using a method indicated by the one or more rounding attributes ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, Pasca, and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia, and Ould, wherein the instructions includes the attributes to identify the rounding done in Nvidia, and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling].
Regarding claim 22, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 17 above.
Ould further teaches wherein the one or more processors are to cause the one or more FP32 numbers to be converted automatically [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "The SIMD instructions for the GPGPU cores can be generated at compile time by a shader compiler or automatically generated when executing programs written and compiled for single program multiple data (SPMD) or SIMT architectures." par. 78].
Pasca also teaches truncation of values ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
Regarding claim 23, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 17 above.
Ould teaches wherein the one or more processors are to cause the one or more FP32 numbers to be converted by discarding bits from each of the one or more FP32 numbers [instruction unit 254, par. 75; "The logic 1700 can then perform the numerical operation using a number of bits associated with a second precision that is lower than the first precision, as shown at block 1704." par. 205; "1400 can attempt to perform FP32 operations at FP16 precision, while power gating elements and components beyond those required to perform operations at FP16 precision." par. 195; GPGPU Cores 262, par. 77-78].
Pasca also teaches truncation of values ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
Regarding claim 25, Ould teaches A non-transitory machine-readable medium having stored thereon instructions [“The instructions are cached in the instruction cache 252 and dispatched for execution by the instruction unit 254. the instruction cache 252 receives a stream of instructions to execute from the pipeline manager 232. The instructions are cached in the instruction cache 252 and dispatched for execution by the instruction unit 254” par. 75]
The reminder of claim 25 is directed to claim 1. A change in statutory class is obvious. As such, claim 25 is rejected for the same reasons given in claim 1
Regarding claim 26, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 25 above.
Nvidia teaches performing a matrix multiply-accumulate (MMA) operation [“The core language extensions… allow programmers to define a kernel as a C++ function… the runtime is introduced in CUDA Runtime. It provides C and C++ functions… the runtime is built on top of… the CUDA driver API…” P.15; See Appendix B. C++ Language Extensions, Sec.B.17 Warp Matrix Functions, “C++ warp matrix operations leverage Tensor Cores to accelerate matrix problems of the form D=A*B+C… All following functions and types are defined in the namespace nvcuda::wmma. Sub-byte operations are considered preview, i.e. the data structures and APIs…”];
Ould teaches wherein the instructions, if performed by the one or more processors, further cause the one or more processors to cause a matrix multiply and accumulate operation to be performed using the one or more TF32 numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319].
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
Regarding claim 27, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 25 above.
Nvidia further discloses Warp matrix operation and processing using threads with a GPU consisting of Streaming multiprocessors [“The multiprocessor creates, manages, schedules, and executes threads in groups of 32 parallel threads called warps” 104; See Sec. 4.1. SIMT Architecture, p.104-106; See Figure 5 for multiple SMs on p.5]
However, Nvidia does not explicitly disclose cause the one or more processors to cause a matrix multiply and accumulate operation to be performed using the one or more TF32 numbers.
Ould teaches wherein the instructions, if performed by the one or more processors, further cause the one or more processors to cause a matrix multiply and accumulate operation to be performed using the one or more TF32 numbers and multiple groups of threads performing the same instruction, each group to be performed by a different matrix engine [The pipeline manager 232/instruction unit 254, par. 75; Logic 1700, par. 205; graphics multiprocessor 234/GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "The SIMD instructions for the GPGPU cores can be generated at compile time by a shader compiler or automatically generated when executing programs written and compiled for single program multiple data (SPMD) or SIMT architectures." par. 78; “One or more instances of the graphics multiprocessor 234 can be included within a processing cluster 214. “ par. 67; “Each thread within a thread group can be assigned to a different processing engine within a graphics multiprocessor 234.” Par. 69].
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
Regarding claim 28, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 25 above.
Nvidia discloses performing a matrix multiply-accumulate (MMA) operation with an API [“The core language extensions… allow programmers to define a kernel as a C++ function… the runtime is introduced in CUDA Runtime. It provides C and C++ functions… the runtime is built on top of… the CUDA driver API…” P.15; See Appendix B. C++ Language Extensions, Sec.B.17 Warp Matrix Functions, “C++ warp matrix operations leverage Tensor Cores to accelerate matrix problems of the form D=A*B+C… All following functions and types are defined in the namespace nvcuda::wmma. Sub-byte operations are considered preview, i.e. the data structures and APIs…”];
Ould also teaches wherein an application programming interface (API) to perform a matrix multiply and accumulate operation, the set of parameters further indicating a matrix operand comprising the one or more FP32 numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "driver software for the graphics processor translates API calls that are specific to a particular graphics or media library into commands that can be processed by the graphics processor" par. 290].
Pasca also teaches truncation of values ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches the one or more rounding attributes are indicated in a set of instructions ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, Pasca, and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, wherein the instructions includes the attributes to identify the rounding done in Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling].
Regarding claim 29, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 25 above.
Nvidia, Ould, and Pasca does not explicitly disclose wherein the instructions, if performed by the one or more processors, further cause the one or more processors to select, based, at least in part, on the one or more rounding attributes, a way of rounding from a plurality of different ways of rounding
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches the one or more rounding attributes are indicated in a set of instructions to determine a way of rounding from a plurality of different ways of rounding ["An attribute is logically associated with a program block to modify its numerical and exception semantics." 4. Attributes and rounding; "user-selectable rounding-direction attributes are defined" 4.3.2 Directed rounding attributes; also see 4.3 and 4.3.1 for additional attributes for rounding modes]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, Pasca, and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, wherein the instructions includes the attributes to identify the rounding done in Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling].
Regarding claim 30, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 25 above.
Nvidia discloses performing type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould wherein the instructions, if performed by the one or more processors, further cause the one or more processors to compile code to generate machine instructions to perform truncation of the one or more FP32 numbers to generate the one or more rounded FP32 numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; floating point operation is a MMA operation, par. 319; "The SIMD instructions for the GPGPU cores can be generated at compile time by a shader compiler or automatically generated when executing programs written and compiled for single program multiple data (SPMD) or SIMT architectures." par. 78].
Pasca also teaches truncation of numbers ["pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43].
Regarding claim 31, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 25 above.
Nvidia discloses performing type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould teaches further comprising instructions that, if performed by the one or more processors, further cause the one or more processors to generate one or more TF32 numbers from one or more numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78].
However Nvidia and Ould does not explicitly teach generating brain float (BF16) numbers
In the analogous art of Floating-Point data format conversion and operations, Pasca teaches generate one or more TF32 numbers from one or more brain float (BF 16) numbers. ["the input may be received as single-precision, double-precision, or a custom number format, among other formats, and may be scaled to half-precision, bfloat16, another custom number format, and/or the like." par. 49; "a variable may be represented in single-precision floating-point format, half-precision floating-point format, bfloat16 format, and/or the like" par. 3; "For example, an input in a first number format (e.g., bfloat16) may be scaled to a second number format (e.g., half-precision floating-point) so that a digital signal processing (DSP) circuit implemented to receive inputs in the second number format may perform one or more arithmetic operations on the input" par. 15; “the multiplication results may be formatted with a 1-bit sign field, an 8-bit exponent field, and a 10-bit mantissa field.”par.24; "pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, and Pasca before him before the effective filing date of the claimed invention to incorporate the rounding method, the data format conversion methodology and the FP16+++ data format as a custom format as taught by Pasca into the operations, instructions and datatypes disclosed the combination of Nvidia, Ould, and IEEE2008, in order to implement one of the rounding modes (i.e. truncation) and to increase range of FP16 format to reduce overflow and more different types of formats to convert from [Pasca: par. 23-25, 37-38, 43, 54, 57, and 59].
Regarding claim 32, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 25 above.
Nvidia discloses performing type conversion operations on operands and FP64 [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould teaches further comprising instructions that, if performed by the one or more processors, further cause the one or more processors to: generate one or more TF32 numbers from one or more numbers [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78].
Ould teaches the use of double precision 64 bit floating point (FP64) [“GPGPU cores 262 include a single precision FPU and an integer ALU while a second portion of the GPGPU cores include a double precision FPU” par. 77]
However, Ould does not explicitly discloses generate one or more TF32 numbers from one or more double precision 64 bit floating point (FP64) numbers.
In the analogous art of Floating-Point data format conversion and operations, Pasca teaches generate one or more TF32 numbers from double precision (i.e. FP64) numbers. ["the input may be received as single-precision, double-precision, or a custom number format, among other formats, and may be scaled to half-precision, bfloat16, another custom number format, and/or the like." par. 49; “the multiplication results may be formatted with a 1-bit sign field, an 8-bit exponent field, and a 10-bit mantissa field.”par.24; "pre-scaling circuitry 102 and/or additional circuitry and/or logic may zero pad the fraction with a suitable number of bits or truncate a suitable number of bits from the fraction before the scaled input is received at the input circuitry 62" par. 43]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, and Pasca before him before the effective filing date of the claimed invention to incorporate the rounding method, the data format conversion methodology and the FP16+++ data format as a custom format as taught by Pasca into the operations, instructions and datatypes disclosed the combination of Nvidia, Ould, and IEEE2008, in order to implement one of the rounding modes (i.e. truncation) and to increase range of FP16 format to reduce overflow and more different types of formats to convert from [Pasca: par. 23-25, 37-38, 43, 54, 57, and 59].
Claims 7 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Nvidia, Ould, IEEE2008, and Pasca, and further in view of Lee et al. (US 8,266,198 B2), hereinafter Lee.
Regarding claim 7, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 1 above.
Nvidia discloses performing type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould also teaches wherein the one or more circuits are to cause one or more FP32 numbers to be converted by at least rounding an FP32 number to a TF32 [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78].
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
However, Nvidia, Ould, and Pasca does not explicitly teach wherein the one or more circuits are to cause one or more FP32 numbers to be converted by at least rounding an FP32 number to a TF32 number using a round ties to away from zero rounding mode.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches a round ties to away from zero rounding mode. ["roundTiesToAway" 4.3.1 Rounding-direction attributes to nearest]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling].
However, Nvidia, Ould, IEEE2008, and Pasca does not explicitly disclose wherein the one or more circuits are to cause one or more FP32 numbers to be converted by at least rounding an FP32 number to a TF32 number using a round ties to away from zero rounding mode.
In the analogous art of Floating-Point Rounding implementations. Lee discloses an implementation of rounding a floating-point number to be converted to a different precision using a round ties to away from zero rounding mode ["The calculation of round-to-nearest is trivial as it involves simply adding one-half (i.e., adding 1 in the next most significant position after the rounding position) and then truncating the result (i.e., replacing all values after the rounding position with zeroes)" Col. 10, ll.14-18]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, Pasca, and Lee before him before the effective filing date of the claimed invention to modify the circuits, instruction sets and operations disclosed by the combination of Nvidia, Ould, and IEEE2008, to implement the rounding method and circuitry disclosed by Lee, in order to implement round ties to away from zero rounding mode, increase flexibility for rounding and reducing timing costs [Lee: Col.2-3]
Regarding claim 13, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 9 above.
Nvidia discloses performing type conversion operations on operands [See Table and details on p.121; “Sometimes, the compiler must insert conversion instructions, introducing additional execution cycles… Functions operating on variables of type char or short whose operands generally need to be converted to int,” p.123];
Ould teaches wherein the one or more circuits are to cause one or more FP32 numbers to be converted by at least rounding an FP32 number [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78].
However, Nvidia, Ould, and Pasca does not explicitly teach wherein the one or more circuits are to cause one or more FP32 numbers to be converted by at least rounding an FP32 number to a TF32 number using a round ties to away from zero rounding mode.
In the analogous art of Floating-Point Arithmetic, IEEE2008 teaches a round ties to away from zero rounding mode. ["roundTiesToAway" 4.3.1 Rounding-direction attributes to nearest]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould and IEEE2008 before him before the effective filing date of the claimed invention to maintain the IEEE standard (such as the rounding attributes for various rounding types) as taught by IEEE2008, in the operations, instructions and datatypes disclosed by Nvidia and Ould, in order to follow the standards for floating point operations, to allow more control over the rounding modes for various operations, and to incorporate various handling methods [IEEE2008: 4. Attributes and rounding and 7. Default exception handling]
However, Nvidia, Ould, IEEE2008, and Pasca does not explicitly disclose wherein causing the one or more FP32 numbers to be converted comprises adding a value to each of the one or more FP32 numbers and discarding a set of bits from each of the one or more FP32 numbers.
In the analogous art of Floating-Point Rounding implementations, Lee discloses an implementation of rounding a floating-point number to be converted to a different precision using a round ties to away from zero rounding mode including adding a value ["The calculation of round-to-nearest is trivial as it involves simply adding one-half (i.e., adding 1 in the next most significant position after the rounding position) and then truncating the result (i.e., replacing all values after the rounding position with zeroes)" Col. 10, ll.14-18]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, Pasca, and Lee before him before the effective filing date of the claimed invention to modify the circuits, instruction sets and operations disclosed by the combination of Nvidia, Ould, and IEEE2008, to implement the rounding method and circuitry disclosed by Lee, in order to implement round ties to away from zero rounding mode, increase flexibility for rounding and reducing timing costs [Lee: Col.2-3]
Claims 10, 18, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Nvidia, Ould, IEEE2008, and Pasca, and further in view of Migacz et al. (US 8,266,198 B2), hereinafter Migacz.
Regarding claim 10, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 9 above.
Nvidia discloses performing MMA operation and type conversion, see claim 1.
Ould teaches further comprising, performing one or more matrix operations based, at least in part, on one or more matrices ["the floating-point operation is a two-dimensional matrix multiply and accumulate operation" par. 319].
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
However, Nvidia, Ould, IEEE2008, and Pasca does not explicitly disclose one or more matrices comprising the one or more rounded TF32 numbers.
In the analogous art of Precision conversions for data formats, Migacz teaches generating and storing reduced data precision formats for operations [“a computer-implemented process is provided for automating conversion from higher precision data formats to lower precision data formats” par. 10,24,27; “Using reduced data precision formats for inferencing of neural (specifically convolutional) networks offers several advantages over traditional single-precision float formats…” par. 6]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, Pasca, and Migacz before him before the effective filing date of the claimed invention to modify the matrix data disclosed by the combination of Nvidia, Ould, IEEE2018, and Pasca, using a reduced precision format, such as FP16+++ disclosed by Pasca, in order to reduce memory requirements and/or improve performance as taught by Migacz [Migacz: par. 6].
Regarding claim 18, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 17 above.
Nvidia discloses performing MMA operation and type conversion, see claim 1.
Ould teaches wherein the one or more processors are further to cause one or more other processors to perform one or more matrix multiply and accumulate operations using one or more matrices [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; “the floating-point operation is a two-dimensional matrix multiply and accumulate operation" par. 319].
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
However, Nvidia, Ould, IEEE2008, and Pasca does not explicitly disclose one or more matrices comprising the one or more rounded TF32 numbers.
In the analogous art of Precision conversions for data formats, Migacz teaches generating and storing reduced data precision formats for operations [“a computer-implemented process is provided for automating conversion from higher precision data formats to lower precision data formats” par. 10,24,27; “Using reduced data precision formats for inferencing of neural (specifically convolutional) networks offers several advantages over traditional single-precision float formats…” par. 6]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, Pasca, and Migacz before him before the effective filing date of the claimed invention to modify the matrix data disclosed by the combination of Nvidia, Ould, IEEE2018, and Pasca, using a reduced precision format, such as FP16+++ disclosed by Pasca, in order to reduce memory requirements and/or improve performance as taught by Migacz [Migacz: par. 6].
Regarding claim 24, Nvidia, Ould, IEEE2008, and Pasca disclose the invention substantially as claimed. See the discussion of claim 17 above.
Nvidia discloses performing MMA operation and type conversion, see claim 1.
Ould teaches wherein the one or more circuits are further to cause one or more matrix operations to be performed using one or more matrices [instruction unit 254, par. 75; Logic 1700, par. 205; GPGPU Cores 262, par. 77-78; “the floating-point operation is a two-dimensional matrix multiply and accumulate operation" par. 319].
Pasca further teaches the rounded TF32 format [FP16+++, par.24] for the reasons as stated above in at least claim 1.
However, Nvidia, Ould, IEEE2008, and Pasca does not explicitly disclose one or more matrices comprising the one or more rounded TF32 numbers.
In the analogous art of Precision conversions for data formats, Migacz teaches generating and storing reduced data precision formats for operations [“a computer-implemented process is provided for automating conversion from higher precision data formats to lower precision data formats” par. 10,24,27; “Using reduced data precision formats for inferencing of neural (specifically convolutional) networks offers several advantages over traditional single-precision float formats…” par. 6]
It would have been obvious to one of ordinary skill in the art, having the teachings of Nvidia, Ould, IEEE2008, Pasca, and Migacz before him before the effective filing date of the claimed invention to modify the matrix data disclosed by the combination of Nvidia, Ould, IEEE2018, and Pasca, using a reduced precision format, such as FP16+++ disclosed by Pasca, in order to reduce memory requirements and/or improve performance as taught by Migacz [Migacz: par. 6].
Response to Arguments
Applicant’s arguments, see page 14, filed 12/08/2025, with respect to Drawings and Specification Objections have been fully considered and are persuasive. The Drawings and Specification Objections of the Prior Office Action mailed 09/08/2025 (hereinafter Prior Office Action) has been withdrawn.
Applicant’s arguments, see page 14, filed 12/08/2025, with respect to the Rejection under 35 U.S.C. 112(b) have been fully considered and are persuasive. The Rejection under 35 U.S.C. 112(b) of the Prior Office Action has been withdrawn.
Applicant’s arguments, see page 14, filed 12/08/2025, with respect to the Rejection under 35 U.S.C. 101 for non-statutory subject matter, have been fully considered and are persuasive. The Rejection under 35 U.S.C. 101 for non-statutory subject matter of the Prior Office Action has been withdrawn.
Applicant's arguments, see page 14-17, filed 12/08/2025, with respect to Rejections under 35 U.S.C. 101 have been fully considered but they are not persuasive.
For Step 2A, Prong one on page 15, applicant argues the claim requires a specific machine implementation. However, the argument is not part of the analysis of whether or not the claim recites an abstract idea under Step 2A, Prong one. See MPEP 2106.04. Examiner notes that the claim recites at least matrix multiply-accumulate and converting from one format to another. See Gottschalk v. Benson, 409 U.S. 63, 70, 175 USPQ 673, 676 (1972), the conversion of binary-coded-decimal (BCD) numerals into pure binary numbers is directed to “Mathematical Concepts” of abstract ideas. As such, the concept of converting from one data format into another (i.e. one floating point format into another floating point format) is directed to “Mathematical Concepts” of abstract ideas.
For Step 2A, Prong one on page 15, applicant argues the claims cannot be performed in the human mind. However, applicant’s argument is not directed to the rejection as made. As stated the Prior Office Action, the claims fall within the "mathematical concepts" groupings of abstract ideas. See 2106.04(a)(2) for abstract Groupings, 2106.04(a)(2)(I) for Mathematical concepts, and 2106.04(a)(2)(III) for Mental Processes.
For Step 2A, Prong two on page 16, applicant argues the conversion of FP32 to TF32 and performing MMA operation is an improvement to the function or technical field. However, the improvement cannot come from the abstract idea. See MPEP 2106.05(a). As stated in the Prior Office Action and above, the conversion and MMA operation are directed to "mathematical concepts" groupings of abstract ideas.
For Step 2B on pages 16-17, the applicant argues that “an API call to convert FP32 numbers to TF32 numbers… based on rounding attributes…” is recited in the claim and to be significantly more because it improve the functioning of a computer, technology, or technical field. However, Applicant’s arguments are directed to at least a combination of the additional element of an API call and the abstract idea of converting one number format to another format. The analysis of Step 2B is directed to the additional element and as stated above in the rejection under 37 U.S.C 101 above, APIs are well-understood, routine, and conventional for their use in a collection of operations and functions. The examiner respectfully disagrees with the applicant' s assertion to the contrary for at least the reasons above.
Applicant's arguments, see page 14-17, filed 12/08/2025, with respect to Rejections under 35 U.S.C. 103 have been fully considered but they are not persuasive.
Applicant asserts that the references does not teach an application programming interface (API) call to perform a matrix multiply-accumulate (MMA) operation. However, the applicant didn’t provide an argument for why Ould does not disclose the limitation in question for the related rejection for claim 11. Ould discloses API calls that are commands to be processed by the graphics processors. See par.290 of Ould. Furthermore, the arguments have been considered but are moot because the new ground of rejection does rely on a new reference.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kenny K. Bui whose telephone number is (571)270-0604. The examiner can normally be reached 8:00 am to 3:00 pm on Monday, 8:00 am to 4:00 pm on Tuesday to Friday ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew T Caldwell can be reached at (571)272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KENNY K. BUI/Patent Examiner, Art Unit 2182 (571)270-0604
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182