Notice of Pre-AIA or AIA Status
The present application is being examined under the pre-AIA first to invent provisions.
Claims1-20 remain pending in the application under prosecution and have been re-examined.
In the response to this Office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
Response to Arguments
Applicant's arguments filed 11/20/2025 have been fully considered but they are not deemed to be persuasive for at least the following remarks.
Applicant remarks regarding the double patenting rejection has been noted. The non-statutory double patenting is held until submission of a terminal disclaimer that would overcome the rejection.
Applicant argues that, with respect to the rejection under 35 USC § 103, the claimed invention relate to actual execution of instructions of matrix computations while in the applied reference, US 20210048991 by TANNER, discloses neither the source code nor the executable code is actually being executed . However, TANNER from the start suggests: system and technique to optimize execution of a matrix operations, executing instruction to cause the one or more data fetch circuits to fetch the data or operands before the one or more matrix operations; a compiler to generate machine-readable executable code loops through all instructions in a source code or portion, the generated machine-readable executable code for being executed by one or more processors processing tasks represented by a number of threads, executing multi-threaded operations implementing executable instruction to fetch portions of the data and the executable instructions of the sub-operations without increasing data storage required of the processor [Par. 0054-0055; Par. 0060-0061].
The machine-readable executable code is provided (e.g., as part of a software application) to a processor comprising logic circuits to perform instructions including: performing (executing) instructions to accelerate machine learning or deep learning algorithms, training, or inferencing [Par. 0073-0074]; performing operations associated with machine learning computation, matrix operation and data load instructions occurring after a preceding MAD operations; the MAD operation including structural information of a matrix operation that includes storing a list of operands of MAD operations [Par. 0069-0071].
[0060] FIG. 1 illustrates a computing environment 100 to detect matrix operations and optimize generation of executable code, according to at least one embodiment. In at least one embodiment, source code 102 is provided to a compiler 104 which analyzes 106 it to detect optimizable matrix operations. In at least one embodiment, structural information 108 of a matrix operation is extracted from source code. In at least one embodiment, structural information 108 is utilized to interleave 110 executable instructions for data loads and sub-operations of a matrix operation. In an embodiment, source code provided to a compiler is used to generate executable code 112. In at least one embodiment, executable code is provided to a processor 114 which, if executed, causes a matrix operation to be computed.
HAHN (US 20160246726 A1) teaches input and output (I/O) operation caching of FTL data using hints derived from accesses to a storage device and from file system metadata and for caching the FTL data, the I/O command operation being determined based whether data in an I/O command sequence received by storage device matches a known data pattern, the location specified by MFT (master file table) pattern entry defining data type, storing all the data used by the file system to identify and access files and the derived hint information, wherein the hints may be file types, which provide an indication of how the files and their associated FTL table entries will subsequently be accessed by the host system.
In view of the above remarks, the rejection under 35 USC § 103 is maintained and repeated below.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 12,094,531. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-20 of US Patent 12,094,531 anticipate claims 1-20 of the instant application.
As an example:
Claim 1 (Application)
Claim 1 (US Patent 12,094,531)
A device, comprising: a processing circuitry configured to execute instructions of matrix computations;
a local memory coupled to the processing circuitry to store operands of the instructions; and
a circuit configured to: receive a request to fetch an item from a memory address into the local memory at a local address, the request configured with a hint; and
determine, in response to the request, whether to load the item through a buffer based at least in part on the hint and a data type of the item.
A device, comprising: a plurality of processing units configured to execute instructions and perform at least matrix computations of an artificial neural network via execution of the instructions;
a local memory coupled to the processing units and configured to store at least operands of the instructions during operations of the processing units in execution of the instructions; a memory configured as a buffer; a random access memory; and
a logic circuit coupled to the buffer, the local memory, and the random access memory; wherein the instructions include a first instruction to fetch an item from the random access memory to the local memory;
the first instruction includes a field related to caching the item in the buffer; and
during execution of the first instruction the logic circuit is configured to determine whether to load the item through the buffer based at least in part on the field specified in the first instruction.
Claim 11 (Application)
Claim 15 (US Patent 12,094,531)
A method, comprising:
executing, by a processing circuitry in a device, instructions of matrix computations;
storing, in a local memory coupled to the processing circuitry in the device, operands of the instructions;
receiving, in the device, a request to fetch an item from a memory address into the local memory at a local address, the request configured with a hint; and
determining, by the device in response to the request, whether to load the item through a buffer based at least in part on the hint and a data type of the item.
A method, comprising:
executing, by a plurality of processing units of a device, instructions to perform at least matrix computations of an artificial neural network;
storing, in a local memory coupled to the processing units in the device, at least operands of the instructions during operations of the processing units in execution of the instructions;
receiving a first instruction having a memory address and a local address to request an item at the memory address in a random access memory of the device to be fetched into the local memory at the local address, the first instruction having a field identifying a hint for caching the item in a system buffer of the device; and
determining, during execution of the first instruction, whether to load the item through the system buffer based at least in part on the hint specified in the first instruction and a data type of the item.
Claims 1, 11, and 15 of the patent recite elements that cover of the elements of corresponding claims 1, 11, and 19 of the application, as such anticipate claims 1, 11, and 19.
With respect to claims 2-10, 12-18 and 20 of the instant application and corresponding claims 2-10, 12-14, and 16-20 of U.S. Patent 12,094,531, the claims of the application recite all the elements of corresponding claims of the patent. Therefore, claims 2-10, 12-14, and 16-20 of U.S. Patent 12,094,531 anticipate corresponding claims of the instant application.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20210048991 A1 (TANNER) in view of US 20160246726 A1 (HAHN).
With respect to claims 1, 11, and 19, TANNER teaches device (matrix compression accelerator system), comprising: a processing circuitry configured to execute instructions of matrix computations (a data transfer processor executing matrix transformation operations) [Par. 0062-0065]; a local memory coupled to the processing circuitry to store operands of the instructions (structural information of a matrix operation includes storing a list of operands of operations, a list of operands indicating which registers are used by such operands) [Par. 0067-0067); and a circuit configured to: receive a request to fetch an item from a memory address into the local memory at a local address (data fetch circuit being a logical circuit to fetch data that is utilized as an operand of an opcode in a data access load operations indicating when and how data is fetched in executable instructions for data loads and sub-operations of a matrix operation, the structural information used to schedule instructions or micro-instructions that load data and perform computations using such data) [Par. 0074-0076; Par. 0067-0069].
TANNER teaches machine learning operation with provided hints to a compiler to generate an output [Par. 0061; Par. 0143-0145] with I/O inference and/or training logic used in system for inferencing or predicting pattern operations based, at least in part, on weight parameters calculated using neural network training operations, neural network functions and/or architectures, or neural network use cases [Par. 0290-0292]; but TANNER fails to specifically teach the device wherein the request configured with a hint to determine, in response to the request, whether to load the item through a buffer based at least in part on the hint and a data type of the item. However, HAHN teaches input and output (I/O) operation caching of FTL data using hints derived from accesses to a storage device and from file system metadata and for caching the FTL data, the I/O command operation being determined based whether data in an I/O command sequence received by storage device matches a known data pattern, the location specified by MFT (master file table) pattern entry (the pattern defining the data type) storing all the data used by the file system to identify and access files and the derived hint information, wherein the hints may be file types, which provide an indication of how the files and their associated FTL table entries will subsequently be accessed by the host system) [Par. 0015; Par. 0026-0027; Par. 0030-0034; Par. 0048-0051; Par. 0037-0040].
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing of the instant application to combine the system performing matrix operation in neural network, as taught by TANNER, with the adaptive memory buffer caching using pattern recognition with unassisted hint, as taught by HAHN, in order to efficiently allocate registers while also optimizing for memory latency to prevent memory stalls and reduce memory usage, causing one or more processors to execute more efficiently, improving parallelization of computer programs, as taught by TANNER [Par. 0052; Par. 0064].
The combination is proper because HAHN teaches:
pattern with the adaptive HMB caching module utilizing the hints to determine how to cache FTL data in the HMB and on the storage device to reduce latency in future accesses; the data pattern identifying the file data type and the master table hint performed with respect to I/O operations reduce latency in future access based on the file types that are likely to require multiple accesses to FTL data [Par. 0015; Par. 0038-0039];
hint derivation module that automatically detects patterns in data that is written to a storage device and derives hints from the patterns regarding how data will likely be accessed by a host, the hint derivation module may also utilize frequency of accesses to memory locations and file system metadata to derive hints; the adaptive HMB caching module utilizing the hints to determine how to cache FTL data in the HMB and on the storage device to reduce latency in future accesses [Par. 0015].
With respect to claim 2, TANNER and HAHN, combined teach the device further comprising: the buffer [HAHN’s Par. 0036-0038].
With respect to claim 3, TANNER and HAHN, combined teach the device comprising: a random access memory, wherein the request is configured to load the item from a location at the memory address in the random access memory [(TANNER’s Par. 0188-0190; Par. 0193-0194); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 4, TANNER and HAHN, combined teach the device, wherein the hint is configured in a first predetermined field in the request[(TANNER’s Par. 0188-0190; Par. 0193-0194); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 5, TANNER and HAHN, combined teach the device, wherein a size of the item is configured in a second predetermined field of the request; the memory address is configured in a third predetermined field of the request; and the local address is configured in a fourth predetermined field of the request [(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 6, TANNER and HAHN, combined teach the device, wherein the data type of the item is configured to indicate whether the item is weights of artificial neurons, or inputs to the artificial neurons, or an instruction of matrix computations [TANNER’s Par. 0218-0221; Par. 0188-0190; Par. 0193-0194].
With respect to claim 7, TANNER and HAHN, combined teach the device, wherein the circuit is configured to determine the data type of the item based on the local address[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 8, TANNER and HAHN, combined teach the device, wherein the data type of the item is configured in a predetermined field in the request[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 9, TANNER and HAHN, combined teach the device, wherein the circuit is configured to load the item to the local memory without going through the buffer when the data type and the hint is any one of a first plurality combinations of data type and hint[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 10, TANNER and HAHN, combined teach the device, wherein the circuit is configured to load the item to the local memory through the buffer when the data type and the hint is any one of a second plurality combinations of data type and hint[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 12, TANNER and HAHN, combined teach the device implemented method comprising: extracting, by the device, the hint from a first predetermined field in the request[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 13, TANNER and HAHN, combined teach the device implemented method comprising: extracting, by the device, a size of the item from a second predetermined field of the request[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 14, TANNER and HAHN, combined teach the device implemented method, wherein the data type of the item is: weight at artificial neuron; input to artificial neuron; or instruction of matrix computations[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 15, TANNER and HAHN, combined teach the device implemented method, further comprising: determining, by the device, the data type of the item based on the local address[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 16, TANNER and HAHN, combined teach the device implemented method, further comprising: extracting the data type of the item from a predetermined field in the request[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 17, TANNER and HAHN, combined teach the device implemented method, further comprising: loading the item to the local memory without going through the buffer when the data type and the hint is any one of a first plurality combinations of data type and hint[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 18, TANNER and HAHN, combined teach the device implemented method, further comprising: loading the item to the local memory through the buffer when the data type and the hint is any one of a second plurality combinations of data type and hint[(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
With respect to claim 20, TANNER and HAHN, combined teach the device implemented method, wherein the data type is one of: weight, input, or instruction; and the hint is one of: weight stationary, output stationary, input stationary, or row stationary [(TANNER’s Par. 0188-0190; Par. 0193-0194; Par. 0218-0221); (HAHN’s Par. 0015; Par. 0026-0027; Par. 0033-0038)].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
WO 2023073008 A1 (ABAIL et al) teaching plurality of operation modes being selectively enabled or disabled, by a cache directory, based on a computation phase, data type, and data pattern for caching data in a cache having a plurality of address tags in the cache directory greater than a number of data lines in a cache array..
US 20210182077 A1 (CHEN et al) teaching method comprising: acquiring first information, wherein the first information is information to be processed by a terminal device; calling an operation instruction in a calculation apparatus to calculate the first information so as to obtain second information; and outputting the second information..
US 20220223201 A1 (ZAIDY et al) teaching systems, devices, and methods related to a Deep Learning Accelerator and memory , the accelerator having processing units to perform at least matrix computations of an artificial neural network via execution of instructions, the processing units having a local memory store operands of the instructions, wherein the accelerator can access a random access memory via a system buffer, or without going through the system buffer.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PIERRE MICHEL BATAILLE whose telephone number is (571)272-4178. The examiner can normally be reached Monday - Thursday 7-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TIM VO can be reached on (571) 272-3642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PIERRE MICHEL BATAILLE/Primary Examiner, Art Unit 2136