Last updated: April 19, 2026
Application No. 17/482,166
EMULATION OF FLOATING POINT CALCULATION

Non-Final OA §101§103
Filed
Sep 22, 2021
Examiner
GUDAS, JAKOB OSCAR
Art Unit
2151
Tech Center
2100 — Computer Architecture & Software
Assignee
Intel Corporation
OA Round
5 (Non-Final)
This examiner grants 44% of cases after interview

— +71.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 9 resolved cases, 2023–2026
Examiner Intelligence

GUDAS, JAKOB OSCAR View full profile →
Grants 44% of resolved cases
Career Allow Rate
4 granted / 9 resolved
-10.6% vs TC avg
Strong +71% interview lift
Without
With
+71.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
28 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
33.2%
-6.8% vs TC avg
§103
37.0%
-3.0% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
19.9%
-20.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 9 resolved cases
Office Action

§101 §103
Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is non-final and is in response to claims filed on 02/17/2026 via amendment. Claims 1-3, 5, 7, 10, and 21-32 are pending for examination. Claims 1, 21, and 27 are currently amended. Claims 2-3, 5, 7, 10, 22-26, and 28-32 are as previously filed.


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/17/2026 has been entered.
 
Response to Arguments
Rejections under 35 U.S.C. 101
Applicant’s arguments regarding the 35 U.S.C. 101 rejections have been fully considered. Regarding the rejection under 35 U.S.C. 101, Applicant argues that “emulation is performed in dedicated systolic hardware that cannot be "practically performed in the human mind". Hence, claim 1 squarely fits the "particular machine" safe harbor and clearly improves the operation of that machine”. See Remarks 7 filed 02/17/2026.
	Examiner respectfully disagrees with Applicant’s arguments. The recitation of the systolic hardware is at a high level of generality and is clearly generally linking the use of the judicial exception to a particular field of use. see MPEP 2106.05(h).
Further, it is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception... However, it is important to keep in mind that an improvement in the abstract idea itself (e.g. a recited fundamental economic concept) is not an improvement in technology...”. See MPEP 2106.05(a).

Rejections under 35 U.S.C. 102
Applicant’s arguments regarding the 35 U.S.C. 102 rejections have been fully considered. Regarding the rejection under 35 U.S.C. 102, Applicant argues “Henry fails to anticipate any approach involving emulation or decomposition such as emulating precision calculations by decomposing first precision format values associated with the first precision format into second precision format values associated with the second precision format, and wherein the systolic hardware is facilitated to execute the floating point multiplication by multiplying second precision operands at the elements and accumulating partial sums in a floating point accumulator within the hardware”. See Remarks 9.
Examiner respectfully disagrees. 37 C.F.R. 1.111(b) states "A general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references does not comply with the requirements of this section." Examiner notes that the teachings of Henry do anticipate this claim language for at least the reasons cited below.
 
Regarding the added limitation of “including systolic hardware having a two-dimensional array of elements”, Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Rejections under 35 U.S.C. 103
Applicant's arguments regarding the previously cited prior art have been fully considered. Regarding the rejection under 35 U.S.C. 103, Applicant argues that since claims 7 and 10 depend from claim 1 that the rejections should be withdrawn. See Remarks 10. Examiner respectfully disagrees with Applicant's arguments. See Examiner's response to arguments above.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-3, 5, 7, 10, and 21-32 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more.

Regarding claim 1, at Step 1, the claim is directed to a machine, which is a statutory category of invention.
At Step 2A prong 1, Examiner notes that the claim is directed towards mental processes and mathematical calculations. The claim language has been reproduced below:
An apparatus comprising: (mental process, evaluation)
processing circuitry including (mental process, evaluation) systolic hardware having a two-dimensional array of elements, (mental process, evaluation) the processing circuitry to: (mental process, evaluation)
receive data associated with a matrix multiplication operation in the first precision format; (mental process, evaluation)
enable a floating point multiplication operation using values in a second precision format, the second precision format having a lower precision than the first precision format (mathematical concepts and/or relationships), wherein the floating point multiplication operation includes (mental process, evaluation) emulating precision calculations by decomposing first precision format values associated with the first precision format into second precision format values associated with the second precision format (mathematical calculation), and wherein the systolic hardware is facilitated (mental process, evaluation) to execute the floating point multiplication by multiplying second precision operands at the elements and accumulating partial sums (mathematical calculation) in a floating point accumulator within the hardware; and 
generate results associated with the matrix multiplication operation (mathematical calculation).

Each of the nonbolded limitation are mental processes or mathematical 	calculation. The “An apparatus comprising” limitation is an evaluation mental process that can be performed by choosing what the apparatus comprises. The “processing circuitry including” limitation is an evaluation mental process that can be performed by choosing what the circuitry includes. The “systolic hardware having a two-dimensional array of” limitation is an evaluation mental process that can be performed by choosing what the systolic hardware includes. The “processing circuitry to:” limitation is an evaluation mental process that can be performed by choosing what the processing circuitry does. The “data associated with a matrix multiplication operation” limitation is an evaluation mental process that can be performed by choosing what the first data is. The “enable an emulated floating point multiplication operation” is a mathematical concepts and/or relationship that can be done by someone merely changing how many places a number is rounded to. The “wherein the floating point multiplication operation includes” limitation is an evaluation mental process that can be performed by choosing what the operation includes. The “emulating precision calculations by decomposing first precision format values” limitation is a mathematical calculation that can be performed by decomposing the values by hand using pen and paper. The “wherein the systolic hardware is facilitated” limitation is an evaluation mental process that can be performed by choosing what the hardware is facilitated to do. The “execute the floating point multiplication by multiplying second precision operands at the elements and accumulating partial sums” limitation is a mathematical calculation that can be performed by multiplying the operands and accumulating the partial sums by hand using pen and paper. The “generate results associated with the matrix multiplication operation” limitation is a mathematical operation. One could perform the operation using a pen and paper and the formula in paragraph [0402] of the specification.

At Step 2A prong 2, The additional elements are bolded above. the " systolic hardware" is generally linking the use of the judicial exception to a particular field of use. see MPEP 2106.05(h). The ‘receiving’ limitation, as claimed and under BRI, is an additional element that is insignificant extra-solution activity. For example, ‘receiving’ in the context of this claim encompasses mere data gathering based on generic testing to lead to a response used for the claimed calculating step. See MPEP 2106.05(g). The remaining bolded limitations are generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f).

At Step 2B, the claim recites “receive data for performance of a matrix multiplication operation in the first precision format” and, per MPEP 2106.05(d) (Il), the courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity:
	
i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); and

iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.

Regarding claims 21 and 27, they recite similar language as claim 1 and are rejected for at least the same reasons therein. Herein claims 21 and 27 are directed towards the statutory categories of a method and article or manufacture respectively, thus also satisfying step 1. Moreover, none of the additional elements regarding the generic computer components (i.e. at least one non-transitory computer-readable storage mediums, etc.) are more than high level generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f).

Regarding claims 2, 22, and 28, they are directed to a mathematical calculation that can be calculated using a pen and paper and the formula in paragraph [0368] of the specification. Under steps 2A Prong 2 and 2B, the claims do not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

Regarding claims 3, 23, and 29, they are directed to an evaluation mental process, one can choose a precision to perform calculations by hand. Under steps 2A Prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

Regarding claims 5, 24, and 30, they are directed to an evaluation mental process, one can choose an amount of a type of number to perform calculations by hand. Under steps 2A Prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

Regarding claims 7, 25, and 31, they are directed to an evaluation mental process, one can select between different multiplication operations in their mind depending on requirements that they come up with or are given. Under steps 2A Prong 2 and 2B, the claims do not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

Regarding claim 10, at Step 1, the claim is directed to a machine, which is a statutory category of invention.
At Step 2A prong 1, Examiner notes that the claim is directed towards mental processes and mathematical calculations. The claim language has been reproduced below:
The apparatus of claim 1, wherein the systolic array hardware includes systolic DPAS (Dot-Product, Accumulate, Systolic) hardware having a plurality of DPAS elements, wherein the processing circuitry comprises one or more of graphics processing circuitry or application processing circuitry.
At Step 2A prong 2, The additional elements are bolded above. The systolic array hardware including DPAS hardware, which includes DPAS elements is merely generally linking the mathematical calculations of claim 1 with general systolic DPAS computing. See MPEP 2106.05(h). The remaining bolded limitations are generic computer components that amount to no more than components comprising mere instructions to apply the exception and do not integrate the judicial exception into a practical application. See MPEP 2106.05(f).
Under step 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application, nor do they amount to significantly more than the judicial exception.

Regarding claims 26 and 32, they recite similar language as claim 10 and are rejected for at least the same reasons therein. Herein claims 26 and 32 are directed towards the statutory categories of a method and article or manufacture respectively, thus also satisfying step 1. Moreover, there are no additional elements integrate the judicial exception into a practical application. See MPEP 2106.05(f).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 5, 10, 21-24, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. ("Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations"), as included in the IDS filed 12/13/2022, hereinafter Henry in view of Maiyuran et al. (US Patent Application No. US 20190324746 A1), hereinafter Maiyuran.

With regards to claim 1, Henry teaches An apparatus comprising: processing circuitry including systolic hardware having [a two-dimensional array of elements], (Henry Page 69 Fig. 1: Fig. 1 shows a BF16 FMA that is fully compatible with FP32;  Henry Page 69 Section I: the community has settles on mixed-precision fused-multiply-add (FMA) hardware units; Henry Page 70 Section I: This is due to much smaller multiplier and offering the FLOPS only in form of matrix multiplication by implementing a systolic array in hardware)
	the processing circuitry to: receive data associated with a matrix multiplication operation (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data)
wherein the data is received in a first precision format; (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data)
	enable a floating point multiplication operation using values in a second precision format, (Henry Page 73 Section IV A: starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
	the second precision format having a lower precision than the first precision format, (Henry Page 73 Section IV A: starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
	wherein the floating point multiplication operation includes emulating precision calculations by decomposing first precision format values associated with the first precision format into second precision format values associated with the second precision format, (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices; Henry Page 70 Section II: This sections covers how we decompose FP32 numbers into multiple BF16 numbers)
and wherein the systolic hardware is facilitated to execute the floating point multiplication by multiplying second precision operands at the elements (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
and accumulating partial sums in a floating point accumulator within the hardware; (Henry Page 74 Section IV A: FP32 general matrix-matrix multiply using a triplet of BF16s and six products and adding those results together in FP32)
	and generate results associated with the matrix multiplication operation (Henry Page 74 Section IV A: FP32 general matrix-matrix multiply using a triplet of BF16s and six products and adding those results together in FP32; Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which Starts with FP32 data).
	While Henry teaches systolic hardware, they fail to teach [systolic hardware having] a two-dimensional array of elements.
	However, Maiyuran teaches [systolic hardware having] a two-dimensional array of elements (Maiyuran [0156]: FIG. 17A-17B illustrate details of hardware-based dot product logic 1608, according to some embodiments. FIG. 17A illustrates a grid of multiple functional units that are configurable to perform multiple dot product operations within a single clock cycle. FIG. 17B illustrates a single exemplary functional unit; Maiyuran Fig. 17A: shows a two-dimensional array of elements).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry with the two dimensional array of elements as taught by Maiyuran. One of ordinary skill in the art would have been motivated to make this combination because Henry and Maiyuran both teach systolic array hardware, and Maiyuran enhances the efficiency of Henry by maximizing throughput, graphics processors may perform some operations using specialized, fixed function logic units as taught by Maiyuran (Maiyuran paragraph [0002]).

With regards to claim 2, Henry in view of Maiyuran teaches all of the limitations of claim 1 above. Henry further teaches wherein the matrix multiplication operation includes a Single precision floating General Matrix Multiply (SGEMM) operation (Henry Page 73 Section IV A: SGEMM and SGETRF using combined BF16 Datatypes).

With regards to claim 3, Henry in view of Maiyuran teaches all of the limitations of claim 1 above. Henry further teaches wherein the first precision format comprises a 32-bit floating point (FP32) format, and wherein the second precision format comprises a bfloat 16-bit (BF16) format (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices).

With regards to claim 5, Henry in view of Maiyuran teaches all of the limitations of claim 1 above. Henry further teaches wherein the floating point multiplication operation includes one or more of multiple second precision format values (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
	or multiple Fused Multiply Add (FMA) operations (Henry Page 69 Abstract: We demonstrate how a decomposition into multiple smaller datatypes can be used to assemble a high- precision result, leveraging the higher precision accumulation of the FMA unit; Henry Page 69 Section I: the community has settles on mixed-precision fused-multiply-add (FMA) hardware units).

With regards to claim 10, Henry in view of Maiyuran teaches all of the limitations of claim 1 above. Henry further teaches wherein the processing circuitry comprises one or more of graphics processing circuitry or application processing circuitry (Henry Page 69 Section 1: NVIDIA announced their FP16 input with FP32: output Tensorcores support in Volta and Turing GPUs).
	Henry fails to teach wherein the systolic array hardware includes a systolic DPAS (Dot-Product, Accumulate, Systolic) hardware, having a plurality of DPAS elements.
	However, Maiyuran teaches wherein the systolic array hardware includes a systolic DPAS (Dot-Product, Accumulate, Systolic) hardware, having a plurality of DPAS elements (Maiyuran abstract: a systolic dot product unit a systolic dot product with accumulate; Maiyuran [0156]: FIG. 17A-17B illustrate details of hardware-based dot product logic 1608, according to some embodiments. FIG. 17A illustrates a grid of multiple functional units that are configurable to perform multiple dot product operations within a single clock cycle. FIG. 17B illustrates a single exemplary functional unit).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry in view of Maiyuran with the DPAS elements as taught by Maiyuran. One of ordinary skill in the art would have been motivated to make this combination because Henry and Maiyuran both teach systolic array hardware, and Maiyuran enhances the efficiency of Henry by maximizing throughput, graphics processors may perform some operations using specialized, fixed function logic units as taught by Maiyuran (Maiyuran paragraph [0002]).

With regards to claim 21, Henry teaches A method comprising: receiving, by processing circuitry of a computing device, data associated with a matrix multiplication operation, (Henry Page 69 Fig. 1: Fig. 1 shows a BF16 FMA that is fully compatible with FP32;  Henry Page 69 Section I: the community has settles on mixed-precision fused-multiply-add (FMA) hardware units; Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data)
	wherein the data is received in a first precision format, (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data)
	wherein the processing circuitry includes systolic hardware having [a two-dimensional array of elements;] (Henry Page 70 Section I: This is due to much smaller multiplier and offering the FLOPS only in form of matrix multiplication by implementing a systolic array in hardware)
	enabling a floating point multiplication operation using values in a second precision format, (Henry Page 73 Section IV A: starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
the second precision format having a lower precision than the first precision format, (Henry Page 73 Section IV A: starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
wherein the floating point multiplication operation includes emulating precision calculations by decomposing first precision format values associated with the first precision format into second precision format values associated with the second precision format, (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices; Henry Page 70 Section II: This sections covers how we decompose FP32 numbers into multiple BF16 numbers)
and wherein the systolic hardware is facilitated to execute the floating point multiplication by multiplying second precision operands at the elements (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices) 
and accumulating partial sums in a floating point accumulator within the hardware; (Henry Page 74 Section IV A: FP32 general matrix-matrix multiply using a triplet of BF16s and six products and adding those results together in FP32)
and generating results associated with the matrix multiplication operation (Henry Page 74 Section IV A: FP32 general matrix-matrix multiply using a triplet of BF16s and six products and adding those results together in FP32; Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which Starts with FP32 data).
While Henry teaches systolic hardware, they fail to teach [systolic hardware having] a two-dimensional array of elements.
	However, Maiyuran teaches [systolic hardware having] a two-dimensional array of elements (Maiyuran [0156]: FIG. 17A-17B illustrate details of hardware-based dot product logic 1608, according to some embodiments. FIG. 17A illustrates a grid of multiple functional units that are configurable to perform multiple dot product operations within a single clock cycle. FIG. 17B illustrates a single exemplary functional unit; Maiyuran Fig. 17A: shows a two-dimensional array of elements).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry with the two dimensional array of elements as taught by Maiyuran. One of ordinary skill in the art would have been motivated to make this combination because Henry and Maiyuran both teach systolic array hardware, and Maiyuran enhances the efficiency of Henry by maximizing throughput, graphics processors may perform some operations using specialized, fixed function logic units as taught by Maiyuran (Maiyuran paragraph [0002]).

With regards to claim 22, Henry in view of Maiyuran teaches all of the limitations of claim 21 above. Henry further teaches wherein the matrix multiplication operation includes a Single precision floating General Matrix Multiply (SGEMM) operation (Henry Page 73 Section IV A: SGEMM and SGETRF using combined BF16 Datatypes).

With regards to claim 23, henry in view of Maiyuran teaches all of the limitations of claim 21 above. Henry further teaches wherein the first precision format comprises a 32-bit floating point (FP32) format, and wherein the second precision format comprises a bfloat 16-bit (BF16) format (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices).

With regards to claim 24, Henry in view of Maiyuran teaches all of the limitations of claim 21 above. Henry further teaches wherein the floating point multiplication operation includes one or more of multiple second precision format values (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
	or multiple Fused Multiply Add (FMA) operations (Henry Page 69 Abstract: We demonstrate how a decomposition into multiple smaller datatypes can be used to assemble a high- precision result, leveraging the higher precision accumulation of the FMA unit; Henry Page 69 Section I: the community has settles on mixed-precision fused-multiply-add (FMA) hardware units).

With regards to claim 26, Henry in view of Maiyuran teaches all of the limitations of claim 21 above. Henry further teaches wherein the processing circuitry comprises one or more of graphics processing circuitry or application processing circuitry (Henry Page 69 Section 1: NVIDIA announced their FP16 input with FP32: output Tensorcores support in Volta and Turing GPUs).
	Henry fails to teach wherein the systolic array hardware includes a systolic DPAS (Dot-Product, Accumulate, Systolic) hardware having a plurality of DPAS elements.
	However, Maiyuran teaches wherein the systolic array hardware includes a systolic DPAS (Dot-Product, Accumulate, Systolic) hardware having a plurality of DPAS elements (Maiyuran abstract: a systolic dot product unit a systolic dot product with accumulate; Maiyuran [0156]: FIG. 17A-17B illustrate details of hardware-based dot product logic 1608, according to some embodiments. FIG. 17A illustrates a grid of multiple functional units that are configurable to perform multiple dot product operations within a single clock cycle. FIG. 17B illustrates a single exemplary functional unit).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry in view of Maiyuran with the DPAS elements as taught by Maiyuran. One of ordinary skill in the art would have been motivated to make this combination because Henry and Maiyuran both teach systolic array hardware, and Maiyuran enhances the efficiency of Henry by maximizing throughput, graphics processors may perform some operations using specialized, fixed function logic units as taught by Maiyuran (Maiyuran paragraph [0002]).

Claims 7 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Henry in view of Maiyuran further in view of Anders et al. (US Patent Application No. US 20190042250 A1), hereinafter Anders.

With regards to claim 7, Henry in view of Maiyuran teaches all of the limitations of claim 1 above. Henry fails to teach wherein the processing circuitry is further configured to select the floating point multiplication operation for an application based on one or more of performance or precision requirements associated with the application.
	However, Anders does teach wherein the processing circuitry is further configured to select the floating point multiplication operation for an application based on one or more of performance or precision requirements associated with the application (Anders [0098]: In operation, FP16/INT16/INT8 multiplier 701 is to be reconfigured between 8b/16b integers and 16b floating-point inputs; to support differing performance, numerical range and precision requirements).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry in view of Maiyuran with choosing the operations as taught by Anders. One of ordinary skill in the art would have been motivated to make this combination to increase flexibility to support a variety of data formats (signed and unsigned 8b/16b integer, 16b floating-point) with wide accumulators, and the flexibility to support both dense and sparse matrices, differing performance, numerical range and precision requirements as taught by Anders (Anders [0003] and Anders [0098]).

With regards to claim 25, Henry in view of Maiyuran teaches all of the limitations of claim 21 above. Henry fails to teach further comprising selecting the floating point multiplication operation for an application based on one or more of performance or precision requirements associated with the application.
	However, Anders does teach further comprising selecting the floating point multiplication operation for an application based on one or more of performance or precision requirements associated with the application (Anders [0098]: In operation, FP16/INT16/INT8 multiplier 701 is to be reconfigured between 8b/16b integers and 16b floating-point inputs; to support differing performance, numerical range and precision requirements).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry in view of Maiyuran with choosing the operations as taught by Anders. One of ordinary skill in the art would have been motivated to make this combination to increase flexibility to support a variety of data formats (signed and unsigned 8b/16b integer, 16b floating-point) with wide accumulators, and the flexibility to support both dense and sparse matrices, differing performance, numerical range and precision requirements as taught by Anders (Anders [0003] and Anders [0098]).

Claims 27-30, and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Henry in view of Maiyuran further in view of Ould-Ahmed-Vall et al. (US Patent Application No. US 20180315159 A1), hereinafter Ould- Ahmed-Vall.

With regards to claim 27, Henry teaches receiving, by processing circuitry of the computing device, data associated with a matrix multiplication operation, (Henry Page 69 Fig. 1: Fig. 1 shows a BF16 FMA that is fully compatible with FP32;  Henry Page 69 Section I: the community has settles on mixed-precision fused-multiply-add (FMA) hardware units; Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data)
	wherein the data is received in a first precision format, (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data)
wherein the processing circuitry includes systolic hardware having [a two-dimensional array of elements;] (Henry Page 70 Section I: This is due to much smaller multiplier and offering the FLOPS only in form of matrix multiplication by implementing a systolic array in hardware)
	enabling a floating point multiplication operation using values in a second precision format, (Henry Page 73 Section IV A: starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
the second precision format having a lower precision than the first precision format, (Henry Page 73 Section IV A: starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
wherein the floating point multiplication operation includes emulating precision calculations by decomposing first precision format values associated with the first precision format into second precision format values associated with the second precision format, (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices; Henry Page 70 Section II: This sections covers how we decompose FP32 numbers into multiple BF16 numbers)
and wherein the systolic hardware is facilitated to execute the floating point multiplication by multiplying second precision operands at the elements (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
and accumulating partial sums in a floating point accumulator within the hardware; (Henry Page 74 Section IV A: FP32 general matrix-matrix multiply using a triplet of BF16s and six products and adding those results together in FP32)
and generating results associated with the matrix multiplication operation (Henry Page 74 Section IV A: FP32 general matrix-matrix multiply using a triplet of BF16s and six products and adding those results together in FP32; Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which Starts with FP32 data).
While Henry teaches systolic hardware, they fail to teach [systolic hardware having] a two-dimensional array of elements.
	However, Maiyuran teaches [systolic hardware having] a two-dimensional array of elements (Maiyuran [0156]: FIG. 17A-17B illustrate details of hardware-based dot product logic 1608, according to some embodiments. FIG. 17A illustrates a grid of multiple functional units that are configurable to perform multiple dot product operations within a single clock cycle. FIG. 17B illustrates a single exemplary functional unit; Maiyuran Fig. 17A: shows a two-dimensional array of elements).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry with the two dimensional array of elements as taught by Maiyuran. One of ordinary skill in the art would have been motivated to make this combination because Henry and Maiyuran both teach systolic array hardware, and Maiyuran enhances the efficiency of Henry by maximizing throughput, graphics processors may perform some operations using specialized, fixed function logic units as taught by Maiyuran (Maiyuran paragraph [0002]).
	Henry in view of Maiyuran fails to teach At least one computer-readable medium having stored thereon instructions which, when executed, cause a computing device to perform operations comprising.
	However, Ould-Ahmed-Vall teaches At least one computer-readable medium having stored thereon instructions which, when executed, cause a computing device to perform operations comprising: (Ould-Ahmed-Vall paragraph [0309]: machine-readable medium which represents and/or defines logic within an integrated circuit such as a processor).
Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry in view of Maiyuran with the computer-readable medium as taught by Ould-Ahmed-Vall. One of ordinary skill in the art would have been motivated to make this combination because the hardware model may be supplied to various customers or manufacturing facilities as taught by Ould-Ahmed-Vall, and Non-transitory computer readable mediums are very common, and they allow for the easier access to the software taught by Henry (Ould-Ahmed-Vall paragraph [0309]).

With regards to claim 28, Henry in view of Maiyuran further in view of Ould-Ahmed-Vall teaches all of the limitations of claim 27 above. Henry further teaches wherein the matrix multiplication operation includes a Single precision floating General Matrix Multiply (SGEMM) operation (Henry Page 73 Section IV A: SGEMM and SGETRF using combined BF16 Datatypes).

With regards to claim 29, Henry in view of Maiyuran further in view of Ould-Ahmed-Vall teaches all of the limitations of claim 27 above. Henry further teaches wherein the first precision format comprises a 32-bit floating point (FP32) format, and wherein the second precision format comprises a bfloat 16-bit (BF16) format (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices).

With regards to claim 30, Henry in view of Maiyuran further in view of Ould-Ahmed-Vall teaches all of the limitations of claim 27 above. Henry further teaches wherein the floating point multiplication operation includes one or more of multiple second precision format values (Henry Page 73 Section IV A: We did a complete GEMM (GEneral Matrix—-matrix Multiply) implementation which starts with FP32 data and behind the scenes converts it into one to three bfloat16 matrices and then does one to nine products with these matrices)
	or multiple Fused Multiply Add (FMA) operations (Henry Page 69 Abstract: We demonstrate how a decomposition into multiple smaller datatypes can be used to assemble a high- precision result, leveraging the higher precision accumulation of the FMA unit; Henry Page 69 Section I: the community has settles on mixed-precision fused-multiply-add (FMA) hardware units).

With regards to claim 32, Henry in view of Maiyuran further in view of Ould-Ahmed-Vall teaches all of the limitations of claim 27 above. Henry further teaches wherein the processing circuitry comprises one or more of graphics processing circuitry or application processing circuitry (Henry Page 69 Section 1: NVIDIA announced their FP16 input with FP32: output Tensorcores support in Volta and Turing GPUs).
	Henry fails to teach wherein the systolic array hardware includes a systolic DPAS (Dot-Product, Accumulate, Systolic) hardware having a plurality of DPAS elements.
	However, Maiyuran teaches wherein the systolic array hardware includes a systolic DPAS (Dot-Product, Accumulate, Systolic) hardware having a plurality of DPAS elements, (Maiyuran abstract: a systolic dot product unit a systolic dot product with accumulate; Maiyuran [0156]: FIG. 17A-17B illustrate details of hardware-based dot product logic 1608, according to some embodiments. FIG. 17A illustrates a grid of multiple functional units that are configurable to perform multiple dot product operations within a single clock cycle. FIG. 17B illustrates a single exemplary functional unit).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry in view of Maiyuran further in view of Ould-Ahmed-Vall with the DPAS elements as taught by Maiyuran. One of ordinary skill in the art would have been motivated to make this combination because Henry and Maiyuran both teach systolic array hardware, and Maiyuran enhances the efficiency of Henry by maximizing throughput, graphics processors may perform some operations using specialized, fixed function logic units as taught by Maiyuran (Maiyuran paragraph [0002]).

Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over Henry in view of Maiyuran further in view of Ould-Ahmed-Vall further in view of Anders.

With regards to claim 31, Henry in view of Maiyuran further in view of Ould-Ahmed-Vall teaches all of the limitations of claim 27 above. Henry fails to teach wherein the operations further comprise selecting the floating point multiplication operation for an application based on one or more of performance or precision requirements associated with the application.
	However, Anders does teach wherein the operations further comprise selecting the floating point multiplication operation for an application based on one or more of performance or precision requirements associated with the application (Anders [0098]: In operation, FP16/INT16/INT8 multiplier 701 is to be reconfigured between 8b/16b integers and 16b floating-point inputs; to support differing performance, numerical range and precision requirements).
	Therefore it would have been obvious before the effective filing date of the claimed invention for one of ordinary skill in the art to combine the teachings of Henry in view of Maiyuran further in view of Ould-Ahmed-Vall with choosing the operations as taught by Anders. One of ordinary skill in the art would have been motivated to make this combination to increase flexibility to support a variety of data formats (signed and unsigned 8b/16b integer, 16b floating-point) with wide accumulators, and the flexibility to support both dense and sparse matrices, differing performance, numerical range and precision requirements as taught by Anders (Anders [0003] and Anders [0098]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jakob O Gudas whose telephone number is (571)272-0695. The examiner can normally be reached Monday-Thursday: 7:30AM-5:00PM Friday: 7:30AM-4:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.O.G./Examiner, Art Unit 2151       

/James Trujillo/Supervisory Patent Examiner, Art Unit 2151
Read full office action
Prosecution Timeline

Sep 22, 2021
Application Filed
Jan 24, 2022
Response after Non-Final Action
Jan 30, 2025
Non-Final Rejection — §101, §103
Mar 24, 2025
Response Filed
May 27, 2025
Final Rejection — §101, §103
Jul 17, 2025
Response after Non-Final Action
Aug 08, 2025
Request for Continued Examination
Aug 16, 2025
Response after Non-Final Action
Aug 20, 2025
Non-Final Rejection — §101, §103
Sep 24, 2025
Response Filed
Oct 16, 2025
Final Rejection — §101, §103
Nov 21, 2025
Response after Non-Final Action
Feb 17, 2026
Request for Continued Examination
Feb 24, 2026
Response after Non-Final Action
Mar 04, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/485,179
Patent 12602200
ANALOG MULTIPLY-ACCUMULATE UNIT FOR MULTIBIT IN-MEMORY CELL COMPUTING
2y 5m to grant Granted Apr 14, 2026
17/765,495
Patent 12566586
HIGH-SPEED QUANTUM RANDOM NUMBER GENERATOR BASED ON VACUUM STATE FLUCTUATION TECHNOLOGY
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
44%
Grant Probability
99%
With Interview (+71.1%)
4y 2m
Median Time to Grant
High
PTA Risk
Based on 9 resolved cases by this examiner. Grant probability derived from career allow rate.
EMULATION OF FLOATING POINT CALCULATION

This examiner grants 44% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email