Last updated: April 19, 2026
Application No. 17/896,925
METHOD AND APPARATUS FOR MATRIX COMPUTATION USING DATA CONVERSION IN A COMPUTE ACCELERATOR

Non-Final OA §102§103§112
Filed
Aug 26, 2022
Examiner
LAROCQUE, EMILY E
Art Unit
2182
Tech Center
2100 — Computer Architecture & Software
Assignee
D-Matrix Corporation
OA Round
1 (Non-Final)
Interview Optional

— +12.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 454 resolved cases, 2023–2026
Examiner Intelligence

LAROCQUE, EMILY E View full profile →
Grants 81% — above average
Career Allow Rate
366 granted / 454 resolved
+25.6% vs TC avg
Moderate +12% lift
Without
With
+12.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
41 currently pending
Career history
495
Total Applications
across all art units
Statute-Specific Performance

§101
29.3%
-10.7% vs TC avg
§103
22.2%
-17.8% vs TC avg
§102
12.8%
-27.2% vs TC avg
§112
29.4%
-10.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 454 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Information Disclosure Statement The listing of references in the specification is not a proper information disclosure statement. 37 CFR 1.98(b) requires a list of all patents, publications, or other information submitted for consideration by the Office, and MPEP § 609.04(a) states, "the list may not be incorporated into the specification but must be submitted in a separate paper." Therefore, unless the references have been cited by the examiner on form PTO-892, they have not been considered. See specification [0002] with respect to Vaswani et. al. Drawings The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because do not include the following reference sign mentioned in the description: figure 4 module 700 as described in [0050] . Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance. Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: compute converter as in claim 1, claim 3, claim 11, claim 14, claim 16, and claim 20; alignment device as in claim 2, claim 10, claim 16, and claim 20; partial products reduction (PPR) device as in claim 3, claim 11, claim 16, and claim 20 ; compute device as in claim 20. Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Specification The specification is objected to because claim elements “ compute converter ” , “ alignment device ” , “ partial products reduction (PPR) device ” , and “ compute device” invoke 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the specification fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. See rejection under 35 USC 112(b) below as to the specific elements of the claimed means that are lacking and thus not described in the specification. No new matter should be entered. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 1 -20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention. Claim 3 , claim 16, and claim 20 recite “the PPR device being configured to determine a first reduced matrix output using the first rounded matrix output”. Claim 11 recites “the PPR device being configured to reduc e the first rounded matrix output to determine a first reduced matrix output ”. However the specification provides no description of the determination of the reduce matrix output beyond merely restating the function claimed. The device is called a partial product reduction device, but no description is provided related to partial products. Nor is description provided as to any other mechanism for reducing the size or character of the matrix output. See specification [0009], [0097-0098], [0104]. Claims 4-5 inherit the same deficiency as claim 3 based on dependence. Claim 1, claim 3, claim 11, claim 14, claim 16, and claim 20 limitations “compute converter” ; claim 2, claim 10, claim 16, and claim 20 limitations “alignment device” ; claim 3, claim 11, claim 16, and claim 20 limitations “partial products reduction (PPR) device” ; and claim 20 limitation “compute device” invoke 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. See rejection under 35 USC 112(b) below as to the specific reasons these elements are lacking structure, material, or acts for performing the entire claimed function that result in this associated rejection for lack of written description of these required elements. Dependent claims inherit the same deficiency as the claim upon which they depend. Claims 1- 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 3, claim 16, and claim 20 recite “the PPR device being configured to determine a first reduced matrix output using the first rounded matrix output”. Claim 11 recites “the PPR device being configured to reduc e the first rounded matrix output to determine a first reduced matrix output ”. It is unclear what is meant by determine a first reduced matrix, reducing the first rounded matrix, and determine a first reduced matrix. It is unclear if this is related to a partial product reduction, a reduction related to rounding or format conversion or other. Claims 4-5 inherit the same deficiency as claim 3 based on dependence. Claim 12 recites “wherein the MSB portion is characterized by a signed integer and the LSB portion is characterized by an unsigned integer.” The “MSB portion” and the “LSB portion” lack antecedent basis. It is unclear to what portions these refer, whether it is the “first input portion” and the “second input portion” or other portions. Claim 1 , and claim 14 recite a “compute converter configured to determine a first converted matrix output and a converted output format using the fist combined matrix output”. Claim 3, claim 11, and claim 16 further recite “the compute converter is configured to determine the first converted matrix output using the first reduced matrix output”. Claim 20 recites a “compute converter configured to determine a first converted matrix output in a converted output format using the first reduced matrix output”. The specification merely describes the compute converter in terms of the functions performed, without recited structure, material or acts to perform the claimed function. See specification [0009], [0098], [0104]. Furthermore, the drawings merely depict the compute converter as a black box. See figure 10A/10B-1028. Dependent claims inherit the same deficiency as the claim upon which they depend. Claim 2 , and claim 20 recite a n “ alignment device being (is) configured to determine a first rounded matrix output in a third format using the first combined matrix output ”. Claim 10, and claim 16 recite the “ alignment device being configured to round the first combined matrix output to determine a first rounded matrix output in a third format ”. The specification merely describes the alignment device in terms of the functions performed, without recited structure, material or acts to perform the claimed function. See specification [0009], [009 6 ], [0104]. Furthermore, the drawings merely depict the compute converter as a black box. See figure 10/10B-1024 Dependent claims inherit the same deficiency as the claim upon which they depend. Claim 3, claim 16 , and claim 20 recite the “PPR device being configured to determine a first reduced matrix output using the first rounded matrix output”. Claim 11 recite s the “PPR device being configured to reduce the first rounded matrix output to determine a first reduced matrix output”. The specification merely describes the alignment device in terms of the functions performed, without recited structure, material or acts to perform the claimed function. See specification [0009], [0096-0097], [0104]. Furthermore, the drawings merely depict the compute converter as a black box. See figure 10/10B-102 6. Dependent claims inherit the same deficiency as the claim upon which they depend. Claim 20 recites the “compute device is configured to shift the first matrix output and to add the shifted first matrix output to the second matrix output to determine a first combined matrix output in a second format”. The specification describes the computing device comprising a plurality of compute units. However the specification provides no discussion as to how the compute unit is configured to perform the shift and add function. Furthermore, the specification describes using the compute device to perform the shift and add function merely in functional terms. See specification [0009], [0092], [0094-0095], [0105]. . Therefore, the claim s are indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph. Applicant may: (a) Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph; (b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or (c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)). If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: (a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or (b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. Clai m s 1 , 7, and 1 3 -14 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by X. Lian et al., High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic , IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 27, No. 8, 2019 (hereinafter “ Lian ”) . Regarding claim 1, Lia n teaches the following: an input buffer (IB) device configured to receive a first matrix input, the first matrix input being characterized by a first format and having at least a first input portion and a second input portion ( fig 3 Conv input buffer, section II.A BFP comprises N mantissas and one shared exponent, first and second of the N BFP mantissa for the first input portion and second input portion respectively, figure 4 inputs showing two portions i1( x,y ), and i1 (x, y+7) ) ; a compute device coupled to the IB device, the compute device comprising a plurality of compute units having at least a first compute unit and a second compute unit, the first compute unit being configured to determine a first matrix output using at least the first input portion, and the second compute unit being configured to determine a second matrix output using at least the second input portion, and the compute device being configured to determine a first combined matrix output in a second format using the first matrix output and the second matrix output ( section II.A BFP comprises N mantissas and one shared exponent, first and second of the N BFP mantissa for the first input portion and second input portion respectively, figure 4 details of the convolution PEA, inputs showing two portions i1( x,y ), and i1 (x, y+7) , fig 2 data flow of the convolution, 16 bit converted input to 2, 8 bit BFP input to first and second multiplier for first and second compute unit configured to determine first and second matrix output using first and second input portions respectively , combined output 32 bit BFP at output of the ACC bit for combined matrix output in a second format ) ; wherein the compute device comprises a compute converter configured to determine a first converted matrix output in a converted output format using the first combined matrix output (fig 2 BFP2FP converter convert 32bit BFP to 16 bit FP ) ; and an output buffer (OB) device coupled to the compute device, the OB device being configured to store the first converted matrix output (fig 3 Conv output buffer) . Regarding claim 7 , in addition to the teachings addressed in the claim 1 analysis, Lian teaches the following: wherein the compute device is configured to shift the first matrix output and to add the shifted first matrix output to the second matrix output to determine the first combined matrix output (Fig 2 shifter, adder, ACC, section II.C. p. 1877 left column shifting by the difference between the bias exponent and output exponent, and the mantissa and added in the accumulator, combined 32 bit output). Regarding claim 13 , in addition to the teachings addressed in the claim 1 analysis, Lian teaches the following: further comprising an input converter device coupled to IB device, the input converter device being configured to convert the first matrix input from a floating point format to the first format (fig 2 FP2BFP) . Regarding claim 14, Lian teaches the following: a plurality of tiles (fig 2, column of boxes of PUs, each PU box for a tile), each of the tiles comprising: a plurality of slices, and central processing unit (CPU) coupled to the plurality of slices (fig 2, each PU for a slice, fig 3 CPU coupled to the PEA, wherein each PU is within the PEA); an input buffer (IB) device configured to receive a first matrix input, the first matrix input being characterized by a first format and having at least a first input portion and a second input portion (fig 3 Conv input buffer, section II.A BFP comprises N mantissas and one shared exponent, first and second of the N BFP mantissa for the first input portion and second input portion respectively, figure 4 inputs showing two portions i1( x,y ), and i1 (x, y+7)) ; a compute device coupled to the IB device, the compute device comprising a plurality of compute units having at least a first compute unit and a second compute unit, the first compute unit being configured to determine a first matrix output using at least the first input portion, and the second compute unit being configured to determine a second matrix output using at least the second input portion, and the compute device being configured to determine a first combined matrix output in a second format using the first matrix output and the second matrix output (section II.A BFP comprises N mantissas and one shared exponent, first and second of the N BFP mantissa for the first input portion and second input portion respectively, figure 4 details of the convolution PEA, inputs showing two portions i1( x,y ), and i1 (x, y+7), fig 2 data flow of the convolution, 16 bit converted input to 2, 8 bit BFP input to first and second multiplier for first and second compute unit configured to determine first and second matrix output using first and second input portions respectively, combined output 32 bit BFP at output of the ACC bit for combined matrix output in a second format) ; wherein the compute device comprises a compute converter configured to determine a first converted matrix output in a converted output format using the first combined matrix output (fig 2 BFP2FP converter convert 32bit BFP to 16 bit FP) ; and an output buffer (OB) device coupled to the compute device, the OB device being configured to store the first converted matrix output (fig 3 Conv output buffer) . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Cla im 2 is rejected unde r 35 U.S.C. 103 as being unpatentable over Lian in view of M. Drumond et al., Training DNNs with Hybrid Block Floating Point , 32 nd Conference eon Neural Information Processing Systems ( NeurIPS 2018), 2018 (hereinafter “ Drumond ”) . Regarding claim 2 , in addition to the teachings addressed in the claim 1 analysis, Lian teaches the following: wherein the compute device comprises an alignment device coupled to the plurality of compute units (Fig 2 shifter, adder, ACC coupled to PU compute units, section II.C. p. 1877 left column shifting by the difference between the bias exponent and output exponent, and the mantissa, exponent collected at the ACC for alignment device coupled to the plurality of compute units, section II.D. p. 1877 for rounding). Lian further discloses rounding generally including truncation (Section II.D. p. 1877), but does not explicitly disclose wherein the alignment device is configured to determine a first rounded matrix output in a third format using the first combined matrix output. However in the same field of endeavor, Drumond discloses an apparatus similar to Lian for performing matrix/convolution operations using block floating point numbers, and converting the block floating point output to floating point ( Abstract, fig 1, fig 2 ) . Drumond further discloses truncation of the mantissa output of the convolution (Section 5.3 second paragraph). It would have been obvious to one of ordinary skill in the art before the effective filing date to perform truncated rounding as in Drumond in the ACC of Lian. It would have been obvious to use one of the common methods to handle shifted bits (Lian section II.D. p. 1877 first paragraph), to achieve the benefit of ensuring there is no overflow or saturation ( Drumond section 5.3 second paragraph ). The truncated buts therefore also results in a third format. Cla im 6 is rejected unde r 35 U.S.C. 103 as being unpatentable over Lian in view of US 20200302284 A1 Garcia Garcia (hereinafter “Garcia”) . Regarding claim 6 , in addition to the teachings addressed in the claim 1 analysis, Lian teaches the following: wherein the IB device, the compute device, the OB device are configured within a first compute path as a first IB device, a first compute device, a first OB device, respectively (fig 3, fig 4, top horizontal path for first compute device) ; and further comprising one or more second compute paths, each of the additional compute paths having a second IB device, a second compute device coupled to the second IB device, a second OB device coupled to the second compute device, and a second SIMD device coupled to the second OB device (fig 3, fig 4, the plurality of horizontal paths below the top horizontal path for one or more second compute paths) . Lian discloses a parallel convolution engine (abstract), but does not explicitly disclose a Single Instruction, Multiple Data (SIMD) device coupled to the OB device of each compute path. However, in the same field of endeavor, Garcia discloses an apparatus similar to Lian for performing neural network operations using block floating point format (fig 1, fig 4). Garcia further discloses a programmable streaming processor to execute a particular group of threads concurrently in a SIMD architecture where each thread is configured ([0068], fig 8, fig 11, fig 12). It would have been obvious to one of ordinary skill in the art before the effective filing date to execute Lian’s parallel convolution engine using a SIMD device coupled to the OB device in each compute path. It would have been obvious to achieve the benefit of executing a plurality of threads concurrently using the same instruction ([0068]). Cla im 8 -9, and 15 are rejected unde r 35 U.S.C. 103 as being unpatentable over Lian in view of US 20210390382 A1 Kwon et al., (hereinafter “Kwon”) . Regarding claim 8 , in addition to the teachings addressed in the claim 1 analysis, Lian teaches the following: wherein the first matrix input comprises a first matrix weight input and a first matrix activation input (fig 2 pixels for first matrix activation input, weights for first matrix weight input, section II.B wherein both are matrix) ; wherein the first matrix weight input comprises a first matrix weight exponent and a first matrix weight mantissa, and wherein the first matrix activation input comprises a first matrix activation exponent and a first matrix activation mantissa (fig 2B weights, and weight exponent). wherein the first compute unit is configured to store the first weight mantissa (fig 2, fig 3 conv input buffer) wherein the second compute unit is configured to store the first weight mantissa (fig 2, fig 3 conv input buffer) Lian discloses most significant byte (MSB) portion and least significant byte (LSB) portion with respect to the activation input mantissa, but does not explicitly disclose the MSB portion and LSB portion with respect to the weight mantissa. However, in the same field of endeavor, Kwon discloses and apparatus similar to Lian for performing multiply-accumulate (MAC) operations in a neural network with respect to activations and weig hts in block floating point format (abstract, fig 2). Kwon further discloses: the first matrix weight mantissa having a most significant byte (MSB) portion and a least significant byte (LSB) portion (fig 4, fig 5, [0081-0083]); wherein the first compute unit is configured to determine the first matrix output using the MSB portion of the first matrix weight mantissa and the first matrix activation mantissa ( fig 6A, [0095] one of each group of the weight for MSB portion ); and wherein the second compute unit is configured to and to the determine the second matrix output using the LSB portion of the first matrix weight mantissa and the first matrix activation mantissa ( fig 6A, [0095] another of each group of the weight for LSB portion ). It would have been obvious to one of ordinary skill in the art before the effective filing date to convert the weights of Lian to block floating point format using the same mechanism of Lian for the activations as in figure 2, separate the MSB portion of the weight and LSB portion of the weight into groups as in Kwon to distribute to the multipliers of Lian wherein the MSB portion is sent to the first compute unit and the LSB portion is sent to the second compute unit. It would have been obvious to achieve the benefit of supporting a large dynamic range without loss of accuracy for both the activations and the weights (Kwon [0076]). Regarding claim 9 , in addition to the teachings addressed in the claim 8 analysis, Lian teaches the following: wherein the compute device is configured to shift the first matrix output and to add the shifted first matrix output to the second matrix output to determine the first combined matrix output (Fig 2 shifter, adder, ACC, section II.C. p. 1877 left column shifting by the difference between the bias exponent and output exponent, and the mantissa and added in the accumulator, combined 32 bit output). Regarding claim 15 , in addition to the teachings addressed in the claim 14 analysis, Lian teaches the following: wherein the first matrix input comprises a first matrix weight input and a first matrix activation input (fig 2 pixels for first matrix activation input, weights for first matrix weight input, section II.B wherein both are matrix) ; wherein the first matrix weight input comprises a first matrix weight exponent and a first matrix weight mantissa, and wherein the first matrix activation input comprises a first matrix activation exponent and a first matrix activation mantissa (fig 2B weights, and weight exponent). wherein the first compute unit is configured to store the first weight mantissa (fig 2, fig 3 conv input buffer) wherein the second compute unit is configured to store the first weight mantissa (fig 2, fig 3 conv input buffer) wherein the compute device is configured to shift the first matrix output and to add the shifted first matrix output to the second matrix output to determine the first combined matrix output (Fig 2 shifter, adder, ACC, section II.C. p. 1877 left column shifting by the difference between the bias exponent and output exponent, and the mantissa and added in the accumulator, combined 32 bit output). Lian discloses most significant byte (MSB) portion and least significant byte (LSB) portion with respect to the activation input mantissa, but does not explicitly disclose the MSB portion and LSB portion with respect to the weight mantissa. However, in the same field of endeavor, Kwon discloses and apparatus similar to Lian for performing multiply-accumulate (MAC) operations in a neural network with respect to activations and weig hts in block floating point format (abstract, fig 2). Kwon further discloses: the first matrix weight mantissa having a most significant byte (MSB) portion and a least significant byte (LSB) portion (fig 4, fig 5, [0081-0083]); wherein the first compute unit is configured to determine the first matrix output using the MSB portion of the first matrix weight mantissa and the first matrix activation mantissa ( fig 6A, [0095] one of each group of the weight for MSB portion ); and wherein the second compute unit is configured to and to the determine the second matrix output using the LSB portion of the first matrix weight mantissa and the first matrix activation mantissa ( fig 6A, [0095] another of each group of the weight for LSB portion ). It would have been obvious to one of ordinary skill in the art before the effective filing date to convert the weights of Lian to block floating point format using the same mechanism of Lian for the activations as in figure 2, separate the MSB portion of the weight and LSB portion of the weight into groups as in Kwon to distribute to the multipliers of Lian wherein the MSB portion is sent to the first compute unit and the LSB portion is sent to the second compute unit. It would have been obvious to achieve the benefit of supporting a large dynamic range without loss of accuracy for both the activations and the weights (Kwon [0076]). Cla im 10 is rejected unde r 35 U.S.C. 103 as being unpatentable over Lian in view of Kwon in view of Drumond . Regarding claim 10 , in addition to the teachings addressed in the claim 9 analysis, Lian teaches the following: wherein the compute device comprises an alignment device coupled to the plurality of compute units, the alignment device being configured to round the first combined matrix output to determine a first rounded matrix output in a third format (fig 2, shifter, adder for alignment device ) . Lian further discloses rounding generally including truncation (Section II.D. p. 1877), but does not explicitly disclose wherein the alignment device is configured to determine a first rounded matrix output in a third format using the first combined matrix output. However in the same field of endeavor, Drumond discloses an apparatus similar to Lian for performing matrix/convolution operations using block floating point numbers, and converting the block floating point output to floating point (Abstract, fig 1, fig 2). Drumond further discloses truncation of the mantissa output of the convolution (Section 5.3 second paragraph). It would have been obvious to one of ordinary skill in the art before the effective filing date to perform truncated rounding as in Drumond in the ACC of Lian. It would have been obvious to use one of the common methods to handle shifted bits (Lian section II.D. p. 1877 first paragraph), to achieve the benefit of ensuring there is no overflow or saturation ( Drumond section 5.3 second paragraph ). The truncated buts therefore also results in a third format. Cla im s 18 -19 are rejected unde r 35 U.S.C. 103 as being unpatentable over Lian. Regarding claim 18 , in addition to the teachings addressed in the claim 14 analysis, Lian teaches the following: convert the first matrix input from a floating point format to the first format (fig 2 FP2BFP) . Lian discloses a module to convert the first matrix input from a floating point format to the first floating point format, and discloses a CPU ( fig 3 ) , but does not explicitly disclose the converting module as being within the CPU. However, it would have been obvious to one of ordinary skill in the art before the effective filing date to perform the processing functions of the converter within the CPU. It is obvious to use a known technique to improve similar devices in the same way. MPEP 2141.III.(C). Regarding claim 19 , in addition to the teachings addressed in the claim 14 analysis, Lian teaches the following: an input converter device coupled to the IB device, the input converter device being configured to convert the first matrix input from a floating point format to the first format (fig 2 FP2BFP) . Lian discloses a module to convert the first matrix input from a floating point format to the first floating point format, but does not explicitly disclose wherein each of the plurality of slices comprises an input converter . However, Lian discloses the data flow including the converter (Fig 2) and each slice operates on block floating point inputs (fig 4). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date for each of the plurality of slices comprises an input converter . It is obvious to use a known technique to improve similar devices in the same way. MPEP 2141.III.(C). Allowable Subject Matter Claims 3-5, 11-12, and 16 -17 would be allowable if rewritten to overcome ethe rejection under 35 USC 112(a), 35 USC 112(b), and rewritten in independent form including all of the limitations of the base claim and any intervening claims. Claim 20 would be allowable if rewritten to overcome ethe rejection under 35 USC 112(a), 35 USC 112(b). The following is a statement of reasons for the indication of allowable subject matter . Applicant claims matrix compute apparatus, chiplet devices, and an AI accelerator apparatus, wherein the apparatus as in claim 20 comprises: a plurality of chiplets , each of the chiplets comprising a plurality of tiles, and each of the tiles comprising a plurality of slices, a central processing unit (CPU) coupled to the plurality of slices; wherein each of the plurality of slices comprises: an input buffer (IB) device configured to receive a first matrix input from the CPU in a first format, the first matrix input having a first matrix weight input and a first matrix activation input, wherein the first matrix weight input comprises a first matrix weight exponent and a first matrix weight mantissa, the first matrix weight mantissa having a most significant byte (MSB) portion and a least significant byte (LSB) portion; and wherein the first matrix activation input comprises a first matrix activation exponent and a first matrix activation mantissa; a digital in-memory compute (DIMC) device coupled to the IB device, the 14 DIMC device comprising a plurality of compute units having at least a first compute unit and a second compute unit, an alignment device coupled to the plurality of compute units, and a partial products reduction (PPR) device coupled to the alignment device; wherein the first compute unit is configured to store the MSB portion of the first matrix weight mantissa and to determine the first matrix output using the MSB portion of the first matrix weight mantissa and the first matrix activation mantissa; wherein the second compute unit is configured to store the LSB portion of the first matrix weight mantissa and to the determine the second matrix output using the LSB portion of the first matrix weight mantissa and the first matrix activation mantissa; wherein each of the plurality of compute units is configured for an integer numerical format; and wherein the MSB portion is characterized by a signed integer and the LSB portion is characterized by an unsigned integer; wherein the compute device is configured to shift the first matrix output and to add the shifted first matrix output to the second matrix output to determine a first combined matrix output in a second format; wherein the alignment device is configured to determine a first rounded matrix output in a third format using the first combined matrix output ; wherein the PPR device is configured to determine a first reduced matrix output using the first rounded matrix output; and wherein the compute device comprises a compute converter configured to determine a first converted matrix output in a converted output format using the first reduced matrix output; and an output buffer (OB) device coupled to the DIMC device, the OB device being configured to store the first converted matrix output. Lian is the closest prior art found. Lian discloses a CNN Accelerator with Block-Floating-Point matrix arithmetic operations (abstract, Introduction). Lian further discloses the claimed invention according to the above claim mappings. Lian further discloses weights computed with an 8-bit integer in the multiplier (p. 1877 left column), and sign bit in the weight mantissa (table II), but does not teach or suggest wherein the MSB portion is characterized by a signed integer and the LSB portion is characterized by an unsigned integer . Lian further is silent, not teaching or suggesting wherein a partial products reduction ( PPR ) device is configured to determine a first reduced matrix output using the first rounded matrix output; and wherein the first rounded matrix output is used by a compute converter. Drumond discloses a deep neural network that uses a block floating point format (abstract, fig 1, fig 2). Drumond further discloses the claimed invention according to the above claim mappings. Drumond is, however silent, not teaching or suggesting wherein each of the plurality of compute units is configured for an integer numerical format; and wherein the MSB portion is characterized by a signed integer and the LSB portion is characterized by an unsigned integer, and wherein a pa rtial products reduction ( PPR ) device is configured to determine a first reduced matrix output using the first rounded matrix output; and wherein the first rounded matrix output is used by a compute converter. Garcia discloses an apparatus that performs matrix operations using block floating point encoding (fig 1, fig 3, fig 4, [0039-0040]). Garcia further discloses a sign bit included in the block floating point data (fig 4, [0028]), but does not teach or suggest wherein the MSB portion is characterized by a signed integer and the LSB portion is characterized by an unsigned integer . Garcia further is silent, not teaching or suggesting wherein a partial products reduction ( PPR ) device is configured to determine a first reduced matrix output using the first rounded matrix output; and wherein the first rounded matrix output is used by a compute converter. Kwon discloses a neural network apparatus that performs multiply-accumulate, convolution operations using block floating point data format (abstract, fig 1, fig 2). Kwon further discloses a sign bit included in the block floating point data (fig 4), but does not teach or suggest wherein the MSB portion is characterized by a signed integer and the LSB portion is characterized by an unsigned integer . Kwon further is silent, not teaching or suggesting wherein a partial products reduction ( PPR ) device is configured to determine a first reduced matrix output using the first rounded matrix output; and wherein the first rounded matrix output is used by a compute converter. US 20240095302 A1 Tu et al., (hereinafter “Tu”) discloses an apparatus for matrix multiply-accumulate (MMA) operations on multiple floating point data types including converting between formats (abstract, fig 1A, fig 1B, fig 2A). Tu further discloses decomposing input data into high and low portions (fig 4). Tu is, however silent, not teaching or suggesting wherein each of the plurality of compute units is configured for an integer numerical format; and wherein the MSB portion is characterized by a signed integer and the LSB portion is characterized by an unsigned integer, and wherein a pa rtial products reduction ( PPR ) device is configured to determine a first reduced matrix output using the first rounded matrix output; and wherein the first rounded matrix output is used by a compute converter. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. d-Matrix, d-Matrix Corsair Redefines Performance and Efficiency for AI Inference at Scale , White Paper, 2024. White Paper by Applicant describing aspects of the disclosed invention. S. Srivastava et al., Corsair: An In-Memory Computing Chiplet Architecture for Inference-Time Compute Acceleration , Theme Article: Contemporary Industry Products 2025, IEEE Micro, IEEE Computer Society, 2025. White Paper by Applicant describing aspects of the disclosed invention. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Enter examiner's name" \* MERGEFORMAT EMILY E LAROCQUE whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (469)295-9289 . The examiner can normally be reached on FILLIN "Work schedule?" \* MERGEFORMAT 10:00am - 1200pm, 2:00pm - 8pm ET M-F . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Andrew Caldwell can be reached on 571-272- 3701 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /EMILY E LAROCQUE/ Examiner, Art Unit 2182
Read full office action
Prosecution Timeline

Aug 26, 2022
Application Filed
Mar 12, 2026
Non-Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/866,299
Patent 12602202
Finite State Machine-Based Bit-Stream Generator for Low-Discrepancy Stochastic Computing
2y 5m to grant Granted Apr 14, 2026
17/736,583
Patent 12596475
COMPRESSION AND DECOMPRESSION OF MULTI-DIMENSIONAL DATA
2y 5m to grant Granted Apr 07, 2026
17/499,506
Patent 12579414
ARTIFICIAL NEURON
2y 5m to grant Granted Mar 17, 2026
17/589,092
Patent 12579214
AUGMENTING MATHEMATICAL OPTIMIZATION MODELS GENERATED FROM HISTORICAL DATA
2y 5m to grant Granted Mar 17, 2026
17/689,295
Patent 12578923
METHOD AND APPARATUS FOR GENERATING ARCHITECTURE SPECIFIC CONVOLUTION GRADIENT KERNELS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
81%
Grant Probability
93%
With Interview (+12.2%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 454 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND APPARATUS FOR MATRIX COMPUTATION USING DATA CONVERSION IN A COMPUTE ACCELERATOR

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email