Detailed Action
Response to Amendment
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Claims 1,10 and 19 are amended.
Claims 2-9,11-18 and 20 are originally presented.
Claims 1-20 are rejected.
This Action is Final.
Response to Arguments
Applicant's arguments filed 09/25/2025 have been fully considered but they are not persuasive.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fais et al.(US Patent Application Pub. No: 20190370631 A1) in view of Xu et al.(US Patent No: 11,500,802 B1), and further in view of Parikh et al. (US Patent Application Pub. No: 20210158889 A1).
As per claim Fais teaches a method, comprising:
loading pointers into direct memory access (DMA) circuitry [FIG. 1, direct memory access (DMA) controller 126.], in multiple tiles in a hardware accelerator array [Fig.1, the hardware accelerator 104.], wherein the pointers indicate storage locations of DMA operations [Paragraphs 0032-0033, The pointer of the tile of the input tensor 106 specifies an address location of the local memory 110 (FIG. 1) at which data of the input tile has been stored by the DMA controller 126 (e.g., copied from the system memory 114 to the local memory 110). As such, the pointer of the tile is used by the acceleration manager 102 to identify where to access the data of the input tile in the local memory 110.];
fetching, by the DMA circuitry in the multiple tiles, the DMA operations using the pointers [Paragraphs 0021-0022, the DMA controller 126 autonomously fetches tiles from the input tensor 106 stored in the system memory 114, loads the tile data into the local memory 110, and signals the acceleration manager 102 (e.g., firmware) that the input tile data is available in the local memory 110.]; and
configuring in parallel, by the DMA circuitry in the multiple tiles, DMA circuitry, of the hardware accelerator array to perform the DMA operations [Fig.1; Paragraphs 0017; 0020, …the hardware accelerator 104 may be provided with multiple convolution engines 118 to perform convolution operations on multiple tiles/micro-tiles in parallel.].
Fais does not explicitly disclose multiple columns.
Xu discloses circuitry in multiple columns [Fig.1,col.2,l.57 to col.3,l.8, Processing engine array 110 may include an array of processing engines arranged in rows and columns.].
It would have been obvious one ordinary skill in the art before the effective filling date of the claimed invention, to include Xu 's computing system for performing computing tasks for an application into Fais’s apparatus to perform convolution on input tensor based on hybrid firmware-hardware tile walking for the benefit of performing computing tasks for an application (Xu,col.1,ll.5-11) to obtain the invention as specified as claim 1.
Fais and Xu do not explicitly disclose multiple tiles arranged in multiple columns.
Parikh discloses multiple tiles arranged in multiple columns [Paragraphs 0013-0014,…the array of computing tiles being defined by the plurality of distinct computing tiles being arranged in a plurality of rows and a plurality of columns along an integrated circuit,…].
It would have been obvious one ordinary skill in the art before the effective filling date of the claimed invention, to include Parikh’s method for virtually addressing an array of accelerator tiles of a mixed-signal integrated circuit into Xu 's computing system for performing computing tasks for an application and Fais’s apparatus to perform convolution on input tensor based on hybrid firmware-hardware tile walking for the benefit of performing computing tasks for an application (Xu,col.1,ll.5-11) and allows asynchronous processing of data along a data processing pipeline, and thus enables multiple segments of data to be processed at a same time and possibly in different stages along the pipeline (Parikh,[0035] ) to obtain the invention as specified as claim 1.
As per claim 2, Fais, Xu and Parikh teach all the limitations of claim 1 above, wherein Xu teaches, a method, wherein the multiple tiles include at least one tile in each of the columns in the hardware accelerator array [Xu,col.5,ll.58-65, the number of columns in the processing engine array 210 determines the computational capacity of the processing engine array 210, and the number of rows determines the required memory bandwidth for achieving maximum utilization of the processing engine array 210. The processing engine array 210 can have, for example, 64 columns and 428 rows, or some other number of columns and rows.].
As per claim 3, Fais, Xu and Parikh teach all the limitations of claim 1 above, wherein Fais teaches, a method, wherein the multiple tiles are interface tiles that are in a row of the hardware accelerator array that connect other tiles in the hardware accelerator array with other hardware components on a same integrated circuit as the hardware accelerator array [Fais, Paragraph 0059,…, the acceleration manager 102 informs the hardware accelerator 104 how to walk the tiles/micro-tiles of the input tensor 106 based on the one or more hardware execution parameter(s). The hardware accelerator 104 can execute the one or more command(s) from the message bus 124 one-by-one in a stateless fashion to program the one or more horizontal hardware execution parameter(s), the one or more vertical hardware execution parameter(s), and/or the one or more depth hardware execution parameter(s) in the parameters configuration registers 122.].
As per claim 4, Fais, Xu and Parikh teach all the limitations of claim 3 above, wherein Fais and Xu teach, a method, wherein configuring in parallel the DMA circuitry in the multiple columns comprises: configuring both (i) the DMA circuitry in the interface tiles to perform the DMA operations [Fais, Fig.1; Paragraphs 0017; 0020, …the hardware accelerator 104 may be provided with multiple convolution engines 118 to perform convolution operations on multiple tiles/micro-tiles in parallel.], and (ii) DMA circuitry in memory tiles in each of the columns to perform the DMA operations, wherein the memory tiles are disposed in a row that neighbors the row containing the interface tiles [Xu,col.5,ll.58-65, the number of columns in the processing engine array 210 determines the computational capacity of the processing engine array 210, and the number of rows determines the required memory bandwidth for achieving maximum utilization of the processing engine array 210. The processing engine array 210 can have, for example, 64 columns and 428 rows, or some other number of columns and rows.].
As per claim 5, Fais, Xu and Parikh teach all the limitations of claim 4 above, wherein Fais teaches, a method further comprising: performing the DMA operations to enable data processing engine (DPE) tiles in the hardware accelerator array to perform one or more functions, wherein the memory tiles are disposed between the DPE tiles and the interface tiles [Fais, Paragraph 0020,…,the multiple hardware accelerators 104 may be used to perform convolution operations on multiple tiles/micro-tiles in parallel. To implement convolution operations, the hardware accelerator 104 is provided with an example convolution engine 118. In addition, the hardware accelerator 104 is provided with example parameters configuration registers 122 in which parameters from the acceleration manager 102 are stored to configure the convolution engine 118 to perform convolution operations on tiles from the input tensor 106.].
As per claim 6, Fais, Xu and Parikh teach all the limitations of claim 5 above, wherein Fais teaches, a method, wherein the one or more functions are part of a machine learning model, wherein the hardware accelerator array is an artificial intelligence engine array [Fais, Paragraphs 0020-0021,… hardware accelerator 104 is circuitry (e.g., hardware accelerator circuitry) implementing an accelerator for deep-learning convolution operations. For example, the hardware accelerator 104 may be implemented using logic circuitry (e.g., an integrated circuit (IC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic device (PLD), etc.). Although a single hardware accelerator 104 is shown in FIG. 1, the example deep learning accelerator 101 may be provided with multiple hard.].
As per claim 7, Fais, Xu and Parikh teach all the limitations of claim 1 above, wherein Fais teaches, a method, further comprising: loading the pointers into a controller that controls the hardware accelerator array, wherein the controller loads the pointers into the DMA circuitry in the multiple tiles [Fais, Paragraphs 0032-0033, The pointer of the tile of the input tensor 106 specifies an address location of the local memory 110 (FIG. 1) at which data of the input tile has been stored by the DMA controller 126 (e.g., copied from the system memory 114 to the local memory 110). As such, the pointer of the tile is used by the acceleration manager 102 to identify where to access the data of the input tile in the local memory 110.].
As per claim 8, Fais, Xu and Parikh teach all the limitations of claim 1 above, wherein Fais teaches, a method, further comprising: loading the pointers into DPE tiles in the hardware accelerator array, wherein the DPE tiles load the pointers into the DMA circuitry in the multiple tiles [Fais, Paragraphs 0032-0033, The pointer of the tile of the input tensor 106 specifies an address location of the local memory 110 (FIG. 1) at which data of the input tile has been stored by the DMA controller 126 (e.g., copied from the system memory 114 to the local memory 110). As such, the pointer of the tile is used by the acceleration manager 102 to identify where to access the data of the input tile in the local memory 110.].
As per claim 9, Fais, Xu and Parikh teach all the limitations of claim 8 above, wherein Fais teaches, a method, wherein each of the DPE tiles comprises a core, a memory module, and an interconnect, wherein the interconnects in the DPE tiles are interconnected so that the DPE tiles are able to transmit data between each other [Fais, Paragraphs 0032-0033, The pointer of the tile of the input tensor 106 specifies an address location of the local memory 110 (FIG. 1) at which data of the input tile has been stored by the DMA controller 126 (e.g., copied from the system memory 114 to the local memory 110). As such, the pointer of the tile is used by the acceleration manager 102 to identify where to access the data of the input tile in the local memory 110.].
As per claims 10-18,claims 10-18 are rejected in accordance to the same rational and reasoning as the above claims 1-9 above, wherein claims 10-18 are the device claims for the method of claims 1-9.
As per claims 19-20,claims 19-20 are rejected in accordance to the same rational and reasoning as the above claims 1 and 5 above, wherein claims 19-20 are the system claims for the method of claims 1 and 5.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
RELEVANT ART CITED BY THE EXAMINER
The following prior art made of record and not relied upon is cited to establish the level of skill in the applicant’s art and those arts considered reasonably pertinent to applicant’s disclosure. See MPEP 707.05(c).
References Considered Pertinent but not relied upon
Gately et al. (US Patent Application Pub. No: 20220414050 A1) teaches an apparatus includes an array processor to process at least one array; and the apparatus further includes a memory coupled to the array processor. Gately discloses the at least one array is stored in memory with programmable per-dimension size and stride values.
CHEN et al. (US Patent Application Pub. No: 20200374534 A1) teaches an AI-assisted programmable hardware video codec is disclosed. CHEN discloses according to certain embodiments, a video processing apparatus includes a programmable hardware encoder configured to execute an encoding process on a plurality of input video frames. CHEN suggests the video processing apparatus further includes a controller coupled with the programmable hardware encoder; and the controller is configured to execute a set of instructions to cause the video processing apparatus to: determine first information of the plurality of input video frames, and adjust the encoding process based on the first information.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GETENTE A YIMER whose telephone number is (571)270-7106. The examiner can normally be reached on Monday-Friday 6:30-3:00.Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, IDRISS N ALROBAYE can be reached on 571-270-1023. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair my.uspto.gov/pair/ PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GETENTE A YIMER/Primary Examiner, Art Unit 2181