DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01 December 2025 has been entered.
Election/Restrictions
Claims 10-13 are withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected invention, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 18 December 2024. Applicant is reminded that making a complete response to the Election/Restriction Requirement is accompanied by submitting a new claim set and annotating claims as withdrawn.
Information Disclosure Statement
The listing of references in the Remarks, Pg. 12, filed 07/17/2025 is not a proper information disclosure statement. 37 CFR 1.98(b) requires a list of all patents, publications, or other information submitted for consideration by the Office, and MPEP § 609.04(a) states, "the list may not be incorporated into the specification but must be submitted in a separate paper." Therefore, unless the references have been cited by the examiner on form PTO-892, they have not been considered.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: “memory system” in claim 1.
The term “system” has been interpreted as a generic placeholder. See MPEP 2181.I.A. Furthermore, these generic placeholders are modified by functional language, not modified by structure or acts for performing the claimed function.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
The corresponding structure as described in the specification are identified as follows:
the “memory system” is interpreted as “405” in Fig. 4A-B and as “505” in Fig. 5A, and to comprise an activation matrix and a coefficient matrix as in Fig. 4A-B and to comprise a compressed matrix as in Fig. 5A, and includes input/output connections and equivalents as further disclosed in ([0034-0037], [0041-0043]).
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Construction
Regarding claim 1, the preamble is given patentable weight. Claim 1 contains the limitation “the computation” in the body, which is referring to conducting a computation as recited in the preamble of claim 1. A skilled person in the art reading the claims would consider the claim in view of the body and preamble, and identify them limited to the computation conducted. The body of the claim depends on the preamble for completeness, and gives life, meaning, and vitality to this claim. Therefore, the preamble of claim 1 should be afforded patentable weight.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over US 20180046898 A1 Lo (hereinafter “Lo”) in view of US 20180046900 A1 Dally et al. (hereinafter “Dally”) in view of US 20030158879 A1 Kwon et al. (hereinafter “Kwon”) in view of Neil Weste and David Harris. 2010. CMOS VLSI Design: A Circuits and Systems Perspective (4th. ed.). Addison-Wesley Publishing Company, USA. (hereinafter “Weste”).
Regarding claim 1, Lo teaches:
a system (Fig. 4, 400; [0065]) configured to conduct a computation ([0017-0019]) comprising:
a memory system (Fig. 4, 404; [0057]) configured to provide operands (Fig. 1, 100; [0021], [0027]; Fig. 3B, 302, [0041]) for the computation and store results (Fig. 1, 106; [0022-0023], [0054]);
a sequencer (Fig. 2, 204, 206, and 208; [0025], [0029, [0032]]) configured to:
load a set of the operands (Fig. 2, inputs from 202, [0028]) from the memory system;
shift ([0031]) the loaded set of operands to form shifted operands (Fig. 2, outputs from 208, [0031-0032]);
provide each operand (Fig. 2, output of 204 to shifter 208) of the shifted operands (Fig. 2, outputs from 208, [0031-0032]) to a multiplier accumulator (MAC) (Fig. 2, 210, 212; [0033]) as an operand while skipping ones of the shifted operands that are zero ([0029]);
a MAC (Fig. 2, 210, 212; [0033]) comprising:
a multiplier (Fig. 2, 210; [0033], [0037]) configured to multiply the provided operands ([0037]);
an accumulator (Fig. 2, 214; [0038-0039]) configured to store a temporary result ([0038] corresponding results, [0040]); and
an adder circuit (Fig. 2, 212; [0033], [0038]) configured to add ([0038]) to calculate a final output ([0054]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify with the alternative embodiment of loading from memory. Lo generally teaches storing the output tiles after computation to memory ([0054]) through the bus ([0064]). Although Lo also generally teaches receiving inputs through an I/O device ([0060]), they are silent with explicitly disclosing they are providing operands from the memory system. It would have been obvious to modify as using the load techniques (Fig. 3B, 302, [0041]) to teach pulling operands from memory is a known technique in the art. Thus, it would have been obvious to use a known technique in the art in the same way and for the same purpose.
Further, Lo discloses the claimed invention except for the sequencer containing all of the functional limitations disclosed in claim 1. It would have been obvious to one having ordinary skill in the art at the time the invention was made to treat Lo’s zero skipping sequencer 204, image buffer 206, and shifter 208 as a singular sequencer, since it has been held that forming in one piece an article, which has formerly been formed in two pieces and put together, involves only routine skill in the art. Howard v. Detroit Stove Works, 150 U.S. 164 (1893). The term “integral” is sufficiently broad to embrace constructions united by such means as fastening and welding. In re Hotte, 177 USPQ 326, 328 (CCPA 1973).
Lo generally teaches a MAC and the MAC capability of receiving shifted operands. Lo also generally teaches processing using additional arrays ([0024]), but does not explicitly teach using MAC arrays, thus they are silent with disclosing an array of MACs; a plurality of registers configured to receive an input of provided operands and shift the provided operands between adjacent MACs in the MAC array or within the each MAC. Further, while Lo discloses an adder circuit, they are silent with explicitly disclosing the adder circuit as a carry propagate adder configured to add redundant sum and carry outputs.
Dally teaches an array of MACs (Fig. 2A, PE 210, [0044], [0072]); a plurality of registers configured to receive an input of provided operands and shift the provided operands between adjacent MACs in the MAC array or within the each MAC (Fig. 3A, 345, [0082], [0085]).
It would have been obvious to one of ordinary skill in the art before the effective
filing date to modify Lo’s system with Dally’s MAC array and registers because they are in the claimed invention’s same field of endeavor of accelerating computations for use in neural networks ([0035]). It would have been obvious to one of ordinary skill in the art to implement the MAC array and registers as it allows the device to perform computations in parallel by allowing communication between separate arrays ([0044], [0057]). Making this modification would be beneficial, as this allows Lo’s system to perform operations in parallel with the choice of dataflow, thereby freeing space up for other components and providing more energy-efficient architecture ([0057]).
Dally and Lo in view of Dally are silent with disclosing the adder circuit as a carry propagate adder configured to add redundant sum and carry outputs.
Kwon discloses the adder circuit as a carry propagate adder (Fig. 2 “26” [0036]) configured to add redundant sum (Fig. 2 “
S
A
” [0036]) and carry outputs (Fig. 2 “
C
A
” [0036]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Lo in view of Dally’s system with Kwon’s carry propagate adder because they are in the claimed invention’s same field of endeavor of multiply accumulator circuits ([0004]). Lo discloses the claimed adder except for it being a propagate adder, redundant sum and carry outputs. Kwon discloses in a multiply accumulate architecture, the use of the carry propagate adder to add inputs. Carry propagate adders are a class of adders well known in the art (see Weste, pg. 430 ⁋ 1, pg. 434 sec. 11.2.2 bridging to pg. 435, and pg. 436 describes some examples of carry propagate adders such as sec. 11.2.2.1 carry-ripple adder, sec. 11.2.2.2 carry generation and propagation, pg. 438 sec. 11.2.2.3 PG carry-ripple addition). It would have been obvious to one of ordinary skill in the art to substitute one type of adder for another to achieve the predictable result of adding.
Regarding claim 2, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Lo teaches the memory system, operands, and the computation (see claim 1 mapping).
Lo is silent to disclosing fetch or prefetch and provides it.
Dally teaches fetch or prefetch and provides it ([0066] fetch, the logical OR satisfies the first condition).
The motivation to combine provided with respect to claim 1, equally applies to claim 2.
Regarding claim 3, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Lo teaches the memory system, operands, the computation (see claim 1 mapping), and receiving ([0027]).
Lo generally teaches streaming and buffering ([0020]), but teaches it as the accumulation buffer performing those functional limitations.
Dally teaches buffer streaming input and provides the streaming input ([0059], [0154]).
The motivation to combine provided with respect to claim 1, equally applies to claim 3.
Regarding claim 4, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Lo teaches wherein the computation (see claim 1 mapping):
is matrix multiplication ([0003], [0017]) between a first matrix and a second matrix ([0021-0022]).
Regarding claim 7, in addition to the teachings addressed in the claim 4 analysis,
the rejection of claim 4 is incorporated and Lo teaches wherein the MAC (see claim 1 mapping):
produce a result for a corresponding column ([0033], [0035], [0039], [0046], column) of a result matrix (Fig. 1, 106; [0022-0023], [0054]) of the matrix multiplication (see claim 4 mapping).
Lo generally teaches a MAC and processing using additional arrays ([0024]), but does not explicitly teach using MAC arrays, thus they are silent with disclosing an array of MACs.
Dally teaches an array of MACs (Fig. 2A, PE 210, [0044], [0072]).
The motivation to combine provided with respect to claim 1, equally applies to claim 7.
Regarding claim 8, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Lo teaches wherein the computation (see claim 1 mapping):
is matrix convolution between a coefficient matrix, that produces a matrix as a result of the matrix convolution ([Abstract], [0022]).
Lo generally teaches coefficients ([0022]), but is silent to disclosing an activation matrix that produces a feature matrix.
Dally teaches an activation matrix that produces a feature matrix ([0037], [0046], activation matrix – input activation, feature matrix – output activation).
The motivation to combine provided with respect to claim 1, equally applies to claim 8.
Regarding claim 9, in addition to the teachings addressed in the claim 8 analysis,
the rejection of claim 8 is incorporated and Lo teaches wherein:
a row of the coefficient matrix is loaded ([0029-0030]) from the memory system into the said sequencer,
wherein the said sequencer shifts the loaded row of the coefficient matrix to form the coefficient operands ([0031-0032], [0044]) and forward the coefficient operands as a first operand to the MAC ([0037]) and a multiply accumulation operation is performed in the MAC to achieve convolution computation ([0033], [0035], [0039], [0046]).
Lo generally teaches a MAC and processing using additional arrays ([0024]), but does not explicitly teach using MAC arrays, thus they are silent with disclosing an array of MACs. Lo generally teaches coefficients ([0022]), but is silent to disclosing the row of the activation matrix is loaded in the MACs of the MAC array or a loaded row of the activation matrix is shifted in the MACs of MAC Array to form a second operand.
Dally teaches an array of MACs (Fig. 2A, PE 210, [0044], [0072]); the row of the activation matrix is loaded (Fig. 5A, 505, [0141], [0146], [0196]) in the MACs of the MAC array (Fig. 3A, 310, [0072]) or a loaded row of the activation matrix is shifted in the MACs of MAC Array to form a second operand (Fig. 3A, I, [0074]).
The motivation to combine provided with respect to claim 1, equally applies to claim 9.
Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Lo in view of Dally in view of Kwon in view of Weste in further view of US 11170289 B1 Duong et al. (hereinafter “Duong”).
Regarding claim 5, in addition to the teachings addressed in the claim 4 analysis,
the rejection of claim 4 is incorporated and Lo teaches wherein the sequencer (see claim 1 mapping):
is loaded with a row ([0029-0030]) of the first matrix and is configured to:
for each element of the loaded row from the first matrix, perform a shift left operation ([0031-0032], [0044]) MAC is loaded with a corresponding row of a second matrix ([0037]);
wherein a multiply and accumulate operation is performed in the MAC ([0033]);
wherein results of the multiply and accumulate operation are accumulated in the accumulator of the MAC ([0038]);
wherein the final output of the adder circuit in the MAC is a row ([0033], [0035], [0039], [0046], row) of a result matrix (Fig. 1, 106; [0022-0023], [0054]).
Lo generally teaches a MAC and processing using additional arrays ([0024]), but does not explicitly teach to produce an operand common to all MACs of said MAC Array, the all MACs of the MAC Array.
Dally teaches an array of MACs (Fig. 2A, PE 210, [0044], [0072]).
The motivation to combine provided with respect to claim 1, equally applies to claim 7.
Lo in view of Dally in view of Kwon in view of Weste in view of Duong disclose to produce an operand common to all.
Duong teaches an operand common to all (Col. 7, lines 39-49, constant value
c
i
).
It would have been obvious to one of ordinary skill in the art before the effective
filing date to modify Lo in view of Dally in view of Kwon in view of Weste’s system with Duong’s constant because they are in the claimed invention’s same field of endeavor of accelerating computations for use in neural networks (Col. 8, lines 53-61). It would have been obvious to one of ordinary skill in the art to implement the constant as it allows the device to manipulate the weight vectors so as to normalize them or simply adjust them for certain training purposes, such as configuring to solve a particular problem (image recognition, voice analysis, depth analysis) (Col. 7, lines 47-54). Making this modification would be beneficial, as this allows Lo in view of Dally in view of Kwon in view of Weste’s system the ability to customize its training objectives, thereby giving the device the capability to support various training purposes.
Regarding claim 6, in addition to the teachings addressed in the claim 5 analysis,
the rejection of claim 5 is incorporated and Lo teaches wherein the sequencer (see claim 1 mapping):
skips operation for the each element of the loaded row of the first matrix having a zero value ([0029]).
Response to Arguments
35 USC 112(a). The rejections have been withdrawn based on the amendment to the claims.
35 USC 112(b). The rejections have been withdrawn based on the amendment to the claims.
35 USC 101. The rejections has been withdrawn based on the amendment to the claims.
35 USC 103. Applicant’s amendments to claim 1, with respect to the rejection(s) of claim(s) 1-9 under 35 USC 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Kwon and in view of Weste, as necessitated by the amendment.
Applicant argues the following in substance with respect to the previously cited prior art:
1) Applicant asserts that, Lo does not teach or suggest "shift the loaded set of operands to form shifted operands" and "provide each operand of the shifted operands to a multiplier accumulator (MAC) from an array of MACs as an operand while skipping ones of the shifted operands that are zero" as recited by claim 1 (Remarks p. 11 ⁋ 7).
The claimed sequence of operations requires first shifting a loaded set of operands and then selectively providing operands while skipping zeros among the shifted operands, which is fundamentally different from Lo' s approach of pre-filtering zeros during decompression (Remarks p. 12 ⁋ 2).
This confirms that Dally pre-processes data to remove zeros rather than performing the claimed shifting followed by zero-skipping operation (Remarks p. 12 ⁋ 3).
Examiner respectfully disagrees. Under broadest reasonable interpretation, the words of a claim must be given their plain meaning unless such meaning is inconsistent with the specification, and it is improper to import claim limitations from the specification into the claim. See MPEP 2111.01(I).
The limitations recited from claim 1 do not specify a particular order for which the operations are to be executed. The provide limitation recites “provide each operand of the shifted operands” where “each operand” is interpreted to refer back to “the loaded set of operands” from the “shift the loaded set of operands”, or “a set of the operands”. Thus, Lo discloses the limitation as claimed.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARKUS A VILLANUEVA whose telephone number is (703)756-1603. The examiner can normally be reached M - F 8:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARKUS ANTHONY VILLANUEVA/
Examiner, Art Unit 2151
/James Trujillo/Supervisory Patent Examiner, Art Unit 2151