DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed on 10/29/2025 has been entered. Claims 1-20 remain pending in this application.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-8, 11-15, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Brewer1 (US Pub. No. 2019/0171604 A1) in view of Sankaralingam et al. (US Pub. No. 2017/0083313 A1 hereinafter Sankaralingam) in view of Kotler et al. (US Patent No. 11,734,605 B2 hereinafter Kotler) in view of Brewer2 (US Pub. No. 2019/0042214 A1).
As per claim 1, Brewer1 teaches a method, comprising: receiving an assembly language program identifying data flows through memory locations represented by memory variables (¶ [0355], “For any given or selected program to be executed, the code or instructions for that program, written or generated in any appropriate or selected programming language, are compiled for and loaded into the system 100, including instructions for the HTP 300 and HTF circuits 200, and any which may be applicable to the host processor 110, to provide the selected configuration to the system 100.” ¶ [0605], “Source code further may be compiled into some form of instructions or object code (including assembly language instructions or configuration information).”) and identifying instructions configured to transform data in the data flows (¶ [0286], “…the HTF circuit 200 is a coarse grained reconfigurable compute fabric comprised of interconnected compute tiles 210. The tiles 210 are interconnected with a synchronous fabric referred to as the synchronous mesh communication network 275…This synchronous mesh communication network 275 allows many tiles 210 to be pipelined together to produce a continuous data flow through arithmetic operations…”), and identifying memories in the tiles configured to be used to implement the memory locations represented by the memory variables (¶ [0300], “The work descriptor packet will have one or more arguments, which the HTF dispatch interface 225 will then provide or distribute to the various tiles, as a packet or message (AF message) transmitted through the AF switches 260, to the selected, addressed tiles 210, and further, will typically include an identification of a region in tile memory 325 to store the data (argument(s)), and a thread identifier (“ID”) utilized to track and identify the associated computations and their completion.”).
Brewer1 also teaches a coarse grained reconfigurable array having a plurality of tiles operable in parallel (¶ [0008], “Also as discussed in greater detail below, the representative apparatus, system and method provide for a processor architecture capable of self-scheduling, significant parallel processing and further interacting with and controlling a configurable computing architecture…” ¶ [0286], “…the HTF circuit 200 is a coarse grained reconfigurable compute fabric comprised of interconnected compute tiles 210.” Brewer1’s disclosure concerns reconfigurable computing architecture with improved parallel processing capabilities. The HTF circuit, which is a coarse grained reconfigurable compute fabric, is a main component of the reconfigurable computing architecture. Therefore, Brewer1 teaches a coarse grained reconfigurable computing fabric that contains tiles capable of parallel processing.).
Brewer1 fails to teach a hardware profile identifying details of the coarse grained reconfigurable array as well as generating an execution configuration to be applied to the coarse grained reconfigurable array.
However, Sankaralingam teaches receiving a hardware profile identifying details of a coarse grained reconfigurable array having a plurality of tiles operable in parallel (¶ [0025]-[0030], “In this regard, FIG. 2 illustrates a CGRA configuration circuit 200 that is provided alongside the block-based dataflow computer processor core 100…As seen in FIG. 2, the CGRA 202 of the CGRA configuration circuit 200 is made up of four (4) tiles 208(0)-208(3) that provide corresponding functional units 210(0)-210(3) and switches 212(0)-212(3). It is to be understood that the CGRA 202 is shown as having four (4) tiles 208(0)-208(3) for illustrative purposes only, and that in some aspects the CGRA 202 may include more tiles 208 than illustrated herein.” ¶ [0039]-[0043], “Based on its analysis, the CGRA configuration circuit 200 identifies the destination tiles 208(1) and 208(2) (i.e., the tiles 208(0)-208(3) to which the output of the functional unit 210(0) should be sent) to which the consumer instructions 204(1) and 204(2), respectively, are mapped. The CGRA configuration circuit 200 then determines one or more tiles 208(0)-208(3) (referred to herein as “path tiles”) that comprise a path from the mapped tile 208(0) to each of the destination tiles 208(1) and 208(2). The “path tiles” represent each tile 208(0)-208(3) of the CGRA 202 for which a switch 212(0)-212(3) must be configured in order to route the output of the functional unit 210(0) to the destination tiles 208(1) and 208(2)…In the example of FIG. 4A, the destination tiles 208(1) and 208(2) are located immediately adjacent to the mapped tile 208(0), so the mapped tile 208(0) and the destination tiles 208(1) and 208(2) are the only path tiles for which switch configuration is necessary. The instruction decoding circuit 234 of the CGRA configuration circuit 200 thus generates the switch control configuration 224(0) of the switch 212(0) of the mapped tile 208(0) to route an output 408 to the switch 212(1) of the destination tile 208(1), and generates the switch control configuration 224(1) of the switch 212(1) to receive the output 408 as input…As seen in FIG. 4B, the destination tile 208(2) is not immediately adjacent to the mapped tile 208(1). Accordingly, the CGRA configuration circuit 200 determines a path from the mapped tile 208(1) to the destination tile 208(2) through an intermediate tile 208(3). The path thus includes the mapped tile 208(1), the intermediate tile 208(3), and the destination tile 208(2) as path tiles 208(1), 208(3), and 208(2), respectively.” See also para. 0031-0033.) and generating an execution configuration identifying operation controls to be applied in the coarse grained reconfigurable array during execution of the instructions of the assembly language program (¶ [0030], “The CGRA configuration generated by the CGRA configuration circuit 200 to configure the CGRA 202 to provide the functionality of the dataflow instruction block 206 includes the function control configurations 214(0)-214(3) and the switch control configurations 224(0)-224(3) of the tiles 208(0)-208(3) of the CGRA 202.” ¶ [0044]-[0048], “The instruction decoding circuit 234 then performs the following series of operations on each of the dataflow instructions 204(0)-204(2). The instruction decoding circuit 234 maps the dataflow instruction 204(0) to a tile 208(0) of the plurality of tiles 208(0)-208(3) of the CGRA 202, with the tile 208(0) comprising a functional unit 210(0) and a switch 212(0) (block 502)…The instruction decoding circuit 234 next generates a switch control configuration (e.g., 224(0), 224(1)) of a switch (e.g., 212(0), 212(1)) of each of the one or more path tiles (e.g., 208(0), 208(1)) to route an output (e.g., 408) of the functional unit (e.g., 210(0)) of the mapped tile (e.g., 208(0)) to the destination tile (e.g., 208(1)) (block 514).”).
Brewer1 and Sankaralingam are considered to be analogous to the claimed invention because they are in the same field of configuring reconfigurable computing elements. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Brewer1 with the teachings of Sankaralingam to realize a method for configuring a coarse grained reconfigurable array that involves receiving the hardware details of the reconfigurable array and generating an execution configuration for said reconfigurable array. The motivation to modify Brewer1 with the teachings of Sankaralingam is that knowing the hardware details prior to configuring the array allows for efficient and proper configuration of the reconfigurable array which in turn results in high processing performance coupled with low power consumption.
Although Brewer1 and Sankaralingam teach a general hardware profile for a coarse grained reconfigurable array, they fail to teach the receiving details that indicate the number of tiles available for use.
Accordingly, Kotler teaches wherein the details comprise at least data indicative of a number of the plurality of tiles available for use in the reconfigurable array (Col. 2, lines 43-62, “A compiler receives a description of a machine learning network and generates a computer program that implements the machine learning network. The compiler allocates instructions of the computer program to different groups of processing elements (Tiles) for execution. For example, different groups of Tiles may implement different layers of the machine learning network, or in some cases, a single Tile may implement multiple layers of the machine learning network…For example, the compiler may size the different groups such that the processing times for each group to implement computations of a respective layer on an input sample are substantially similar (e.g., within a threshold time). Furthermore, the compiler may assign specific Tiles to each group based on a set of predefined layout constraints. The compiler may statically schedule at least a portion of the instructions into one or more deterministic phases for execution by the groups of Tiles.” Col. 13, lines 18-55, “FIG. 7 illustrates an example embodiment of a process for allocating 606 a group of Tiles Gi to execute the instructions implementing the layer i…The compiler furthermore determines a total number Ttotal of available Tiles for implementing the MLN. The total number of available Tiles may be manually assigned or may be determined automatically. For example, in one embodiment, the total number of available Tiles include all Tiles within a Mosaic. In another embodiment, the total number of available Tiles includes all Tiles in the MLA, which may span multiple Mosaics. In yet another embodiment, the total number of available Tiles includes all Tiles not otherwise assigned to a different MLN. In yet a further embodiment, the total number of available Tiles may be determined based on the overall computation metric Ctotal for the MLN…In yet further embodiment, the total number of available Tiles may be based on configurable optimization criteria.”).
Brewer1, Sankaralingam, and Kotler are all considered to be analogous to the claimed invention because they are all in the same field of configuring reconfigurable computing elements. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Brewer1 and Sankaralingam with Kotler to arrive at the claimed invention. The motivation to modify Brewer1 and Sankaralingam with the teachings of Kotler is that knowing the amount of tiles available for use prior to configuring the reconfigurable array allows for optimal allocation and configuration of the tiles. For example, with such information, allocation and configuration of the tiles may be conducted in a manner that minimizes power, results in the fewest data transfers between tiles, etc. See Kotler Col. 12, lines 43-59.
Brewer1, Sankaralingam, and Kotler fail to teach an instruction execution schedule that includes timing of execution of the instructions on the reconfigurable array.
However, Brewer2 teaches receiving an instruction execution schedule identifying timing of execution of the instructions of the assembly language program in the tiles (¶ [0032]-[0033], “Accordingly, timing manager 120 of embodiments accepts input of custom instructions description file 112 and framework capabilities description file 113, analyzes the custom instructions with respect to hardware timing, and provides output of custom instructions description file 122 and framework capabilities description file 123 adapted for implementing appropriate timing in the reconfigurable processor…In order to facilitate the custom instruction timing adaptation operation of timing manager 120 of embodiments, timing constraints 121 (e.g., as may be stored in a database of, or accessible to, timing manager 120) is provided setting forth one or more constraint with respect to the hardware timing of the reconfigurable processor. Timing constraints 121 of embodiments may establish timing constraints based upon physical timing of reconfigurable processor capabilities.”).
Brewer1, Sankaralingam, Kotler, and Brewer2 are all considered to be analogous to the claimed invention because they are all in the same field of configuring reconfigurable computing elements. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Brewer1, Sankaralingam, and Kotler with the teachings of Brewer2 to realize a method for configuring a coarse grained reconfigurable array that involves an execution schedule identifying the timing of execution of instructions on the reconfigurable array. The motivation to modify Brewer1, Sankaralingam, and Kotler with the teachings of Brewer2 is that identifying the timing of execution of the instructions prior to configuring the reconfigurable array allows for an easier configuration process as well as eliminates the burden of the compiler having to identify and resolve hardware timing issues.
As per claim 2, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 1. Brewer1 also teaches wherein the coarse grained reconfigurable array includes a plurality of memory interfaces (¶ [0262], “In a representative embodiment, each HTF circuit cluster 205 also comprises a memory interface 215…The various memory interfaces 215…may be implemented using any appropriate circuitry, such as one or more state machine circuits, to perform the functionality specified in greater detail below.” ¶ [0302]-[0303], “FIG. 13 is a block diagram of a representative embodiment of a memory interface 215…The memory interface 215 allows the tiles 210 within a HTF circuit cluster 205 to make requests to the system memory 125, such as DRAM memory.”), one of the memory interfaces configured as a dispatch interface (¶ [0304], “FIG. 14 is a block diagram of a representative embodiment of a HTF dispatch interface 225. Referring to FIG. 14, a HTF dispatch interface 225 comprises a state machine (and other logic circuitry) 470, one or more registers 475, and one or more dispatch queues 472.” Memory and dispatch interfaces, according to the disclosure of Brewer, have the same components but serve different purposes. Therefore, a memory interface could be configured as a dispatch interface and vice versa.); and the coarse grained reconfigurable array further includes a plurality of tiles interconnected via synchronous connections and asynchronous connections (¶ [0286]-[0287], “The tiles 210 are interconnected with a synchronous fabric referred to as the synchronous mesh communication network 275, allowing data to traverse from one tile 210 to another tile 210 without queuing…The tiles 210 are also interconnected with an asynchronous fabric referred to as an asynchronous packet network 265 that allows synchronous domains of compute to be bridged by asynchronous operations, with all packets on the asynchronous packet network 265 capable of being communicated in a single clock cycle in representative embodiments.”), each of the tiles having tile memories and reconfigurable computing logic (¶ [0290], “Referring to FIG. 8, a tile 210 comprises one or more configurable computation circuits 155, control circuitry 145, one or more memories 325…”).
As per claim 3, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 2. Brewer1 also teaches wherein the assembly language program further includes dispatch interface information representing operations to be performed to store inputs into first memory locations represented by first memory variables (¶ [0300], “The HTF dispatch interface 225 receives a data packet containing work to be performed by one or more of the tiles 210…The work descriptor packet will have one or more arguments, which the HTF dispatch interface 225 will then provide or distribute to the various tiles, as a packet or message (AF message) transmitted through the AF switches 260, to the selected, addressed tiles 210, and further, will typically include an identification of a region in tile memory 325 to store the data (argument(s)), and a thread identifier (“ID”)…”); and the method further comprises: identifying, based on dispatch interface information, operating controls of the dispatch interface of the coarse grained reconfigurable array to store inputs to tile memories identified to implement the first memory locations (¶ [0305], “The HTF dispatch interface 225 will create various AF data messages for transmission over the asynchronous packet network 265 to the tiles 210, including to write data into memories 325, which tile 210 will be the base tile 210 (a base tile ID, for transmission of an AF completion message), a thread ID (thread identifier or “TID”), and will send a continuation message to the base tile 210 (e.g., with completion and other counts for each TID), so that the base tile 210 can commence execution once it has received sufficient completion messages. The HTF dispatch interface 225 maintains various tables in registers 475 to track what has been transmitted to which tile 210, per thread ID and XID.”).
As per claim 4, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 3. Brewer1 also teaches wherein the assembly language program further includes memory interface information representing operations to be performed to store or retrieve data at or from second memory locations represented by second memory variables (¶ [0303], “The memory interface 215 allows the tiles 210 within a HTF circuit cluster 205 to make requests to the system memory 125, such as DRAM memory. The memory request types supported by the memory interface 215 are loads, stores and atomics.”); and the method further comprises: identifying, based on the memory interface information, operating controls of the memory interfaces of the coarse grained reconfigurable array to store or retrieve data at or from tile memories identified to implement the second memory locations (¶ [0374], “The memory interface 215 sends an AF message (556) to a tile 210G within the second synchronous domain 538, step 558. The AF message contains the value for variable A and the value is written to tile memory using a parameter from the request table stored in registers 485.”).
As per claim 5, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 4. Brewer1 teaches the assembly language program (¶ [0605], “Source code further may be compiled into some form of instructions or object code (including assembly language instructions or configuration information). Sankaralingam teaches wherein the assembly language program further includes a flow description specifying the data flows (¶ [0021], “As used herein, a “block-based dataflow ISA” is an ISA in which a computer program is divided into dataflow instruction blocks, each of which comprises multiple dataflow instructions that are executed atomically. Each dataflow instruction explicitly encodes information regarding producer/consumer relationships between itself and other dataflow instructions within the dataflow instruction block. The dataflow instructions are executed in an order determined by the availability of input operands (i.e., a dataflow instruction is allowed to execute as soon as all of its input operands are available, regardless of the program order of the dataflow instruction).”). Brewer2 also teaches tracing the data flows with identification of timing of execution of the instructions to identify timing of controls applied in the tiles (¶ [0032]-[0035], “Accordingly, timing manager 120 of embodiments accepts input of custom instructions description file 112 and framework capabilities description file 113, analyzes the custom instructions with respect to hardware timing, and provides output of custom instructions description file 122 and framework capabilities description file 123 adapted for implementing appropriate timing in the reconfigurable processor. For example, timing manager 120 may analyze each of the individual case statements of the custom instructions to determine how many physical clock cycles to spread those instructions over in order to make the target timing requirements…Using the timing constraint analysis, timing manager 120 of embodiments operates to adapt the custom instructions for reconfigurable processor hardware timing.”).
As per claim 6, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 5. Sankaralingam teaches detecting, during the tracing, data flowing into a first tile of the coarse grained reconfigurable array (¶ [0038], “As seen in FIG. 4A, the CGRA configuration circuit 200 first maps the dataflow instruction I0 204(0) to the tile 208(0) (also referred to herein as the “mapped tile 208(0)”) of the CGRA 202. The CGRA configuration circuit 200 configures the CGRA 202 to provide values a 400 and b 402 as inputs 404 and 406, respectively, to the mapped tile 208(0).”); and identifying, based on the detecting of data flowing into the first tile, incoming controls to be applied to the first tile (¶ [0038], “The instruction decoding circuit 234 of the CGRA configuration circuit 200 decodes the dataflow instruction I0 204(0), and then generates the function control configuration 214(0) to correspond to the ADD functionality of the dataflow instruction I0 204(0).”). Brewer2 also teach identifying timing of the incoming control during execution of the assembly language program in the coarse grained reconfigurable array (¶ [0033], “For example, timing constraints 121 may provide information regarding the latency for implementing each of a plurality of capabilities, such as the latency for performing all operations (e.g., add, subtract, logical, etc.), the latency for performing variable access, etc. Using the timing constraint analysis, timing manager 120 of embodiments operates to adapt the custom instructions for reconfigurable processor hardware timing.”).
As per claim 7, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 5. Sankaralingam teaches detecting, during the tracing, data flowing out of a first tile of the coarse grained reconfigurable array (¶ [0039], “In this example, the dataflow instruction I0 204(0) provides its output to both the dataflow instruction I1 204(1) and the dataflow instruction I2 204(2) (also referred to as “consumer instructions 204(1) and 204(2)”). Based on its analysis, the CGRA configuration circuit 200 identifies the destination tiles 208(1) and 208(2) (i.e., the tiles 208(0)-208(3) to which the output of the functional unit 210(0) should be sent) to which the consumer instructions 204(1) and 204(2), respectively, are mapped.); and identifying, based on the detecting of data flowing out of the first tile, outgoing controls to be applied to the first tile (¶ [0039], “The CGRA configuration circuit 200 then determines one or more tiles 208(0)-208(3) (referred to herein as “path tiles”) that comprise a path from the mapped tile 208(0) to each of the destination tiles 208(1) and 208(2). The “path tiles” represent each tile 208(0)-208(3) of the CGRA 202 for which a switch 212(0)-212(3) must be configured in order to route the output of the functional unit 210(0) to the destination tiles 208(1) and 208(2).”). Brewer2 also teaches timing of the outgoing controls (¶ [0032], “For example, timing manager 120 may analyze each of the individual case statements of the custom instructions to determine how many physical clock cycles to spread those instructions over in order to make the target timing requirements. Timing manager 120 of embodiments references framework capabilities description file 113 when analyzing the custom instructions of custom instructions description file 112, such as information regarding the type of variable, the capabilities implicated, etc., to determine hardware timing for the custom instructions.”).
As per claim 8, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 5. Sankaralingam also teaches identifying, based on tracing, connectivity controls of the tiles for data flowing within the tiles according to the instruction execution schedule (¶ [0040], “The instruction decoding circuit 234 of the CGRA configuration circuit 200 thus generates the switch control configuration 224(0) of the switch 212(0) of the mapped tile 208(0) to route an output 408 to the switch 212(1) of the destination tile 208(1), and generates the switch control configuration 224(1) of the switch 212(1) to receive the output 408 as input.”).
As per claim 11, it is a machine claim comprising similar limitations to claim 1, so it is rejected for similar reasons. Additionally, Sankaralingam teaches a memory (¶ [0050], “The CPU(s) 602 may have cache memory 606 coupled to the processor(s) 604 for rapid access to temporarily stored data.”) and a microprocessor coupled with the memory (¶ [0054], “A processor may be a microprocessor…”).
As per claim 12, it is a machine claim comprising similar limitations to claim 2, so it is rejected for similar reasons.
As per claim 13, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the machine of claim 12. Sankaralingam teaches wherein the assembly language program further includes a flow description specifying the data flows (¶ [0021], “As used herein, a “block-based dataflow ISA” is an ISA in which a computer program is divided into dataflow instruction blocks, each of which comprises multiple dataflow instructions that are executed atomically. Each dataflow instruction explicitly encodes information regarding producer/consumer relationships between itself and other dataflow instructions within the dataflow instruction block. The dataflow instructions are executed in an order determined by the availability of input operands (i.e., a dataflow instruction is allowed to execute as soon as all of its input operands are available, regardless of the program order of the dataflow instruction).”), detecting through tracing the data flows, first data flowing into a first tile of the coarse grained reconfigurable array (¶ [0038], “As seen in FIG. 4A, the CGRA configuration circuit 200 first maps the dataflow instruction I0 204(0) to the tile 208(0) (also referred to herein as the “mapped tile 208(0)”) of the CGRA 202. The CGRA configuration circuit 200 configures the CGRA 202 to provide values a 400 and b 402 as inputs 404 and 406, respectively, to the mapped tile 208(0).”), identify incoming controls to be applied to the first tile to facilitate the first data flowing during execution of the assembly language program in the coarse grained reconfigurable array (¶ [0038], “The instruction decoding circuit 234 of the CGRA configuration circuit 200 decodes the dataflow instruction I0 204(0), and then generates the function control configuration 214(0) to correspond to the ADD functionality of the dataflow instruction I0 204(0).”), detecting, through tracking the data flows, second data flowing out of the first tile of the coarse grained reconfigurable array (¶ [0039], “In this example, the dataflow instruction I0 204(0) provides its output to both the dataflow instruction I1 204(1) and the dataflow instruction I2 204(2) (also referred to as “consumer instructions 204(1) and 204(2)”). Based on its analysis, the CGRA configuration circuit 200 identifies the destination tiles 208(1) and 208(2) (i.e., the tiles 208(0)-208(3) to which the output of the functional unit 210(0) should be sent) to which the consumer instructions 204(1) and 204(2), respectively, are mapped.” The second data, in this instance, is the result of the ADD operation performed in the previous step. This ADD result is what is flowing out of the first tile to one or more second tiles.), and identifying outgoing controls to be applied to the first tile to facilitate the second data flowing out (¶ [0039], “The CGRA configuration circuit 200 then determines one or more tiles 208(0)-208(3) (referred to herein as “path tiles”) that comprise a path from the mapped tile 208(0) to each of the destination tiles 208(1) and 208(2). The “path tiles” represent each tile 208(0)-208(3) of the CGRA 202 for which a switch 212(0)-212(3) must be configured in order to route the output of the functional unit 210(0) to the destination tiles 208(1) and 208(2).”). Brewer2 also teaches trace the data flows with identification of the timing of execution of the instructions to identify timing of controls applied in the tiles (¶ [0032]-[0035], “Accordingly, timing manager 120 of embodiments accepts input of custom instructions description file 112 and framework capabilities description file 113, analyzes the custom instructions with respect to hardware timing, and provides output of custom instructions description file 122 and framework capabilities description file 123 adapted for implementing appropriate timing in the reconfigurable processor. For example, timing manager 120 may analyze each of the individual case statements of the custom instructions to determine how many physical clock cycles to spread those instructions over in order to make the target timing requirements…Using the timing constraint analysis, timing manager 120 of embodiments operates to adapt the custom instructions for reconfigurable processor hardware timing.”).
As per claim 14, it is a machine claim comprising similar limitations to claim 4, so it is rejected for similar reasons.
As per claim 15, it is a machine claim comprising similar limitations to claim 3, so it is rejected for similar reasons.
As per claim 17, it is a product claim comprising similar limitations to claim 1, so it is rejected for similar reasons. Additionally, Brewer1 teaches a non-transitory computer storage medium storing instructions which, when executed by a computing device, cause the computing device to perform a method (¶ [0605], “As a consequence, the system and related methods of the present invention, including the various instructions, may be embodied as software which provides such programming or other instructions, such as a set of instructions and/or metadata embodied within a non-transitory computer readable medium, discussed above.”). Brewer2 also teaches receiving an opcode execution schedule identifying timing of execution of the opcodes of the assembly language program in the tiles (¶ [0032]-[0033], “Accordingly, timing manager 120 of embodiments accepts input of custom instructions description file 112 and framework capabilities description file 113, analyzes the custom instructions with respect to hardware timing, and provides output of custom instructions description file 122 and framework capabilities description file 123 adapted for implementing appropriate timing in the reconfigurable processor…In order to facilitate the custom instruction timing adaptation operation of timing manager 120 of embodiments, timing constraints 121 (e.g., as may be stored in a database of, or accessible to, timing manager 120) is provided setting forth one or more constraint with respect to the hardware timing of the reconfigurable processor. Timing constraints 121 of embodiments may establish timing constraints based upon physical timing of reconfigurable processor capabilities.” Although the above citation does not explicitly mention opcodes, each instruction of a software program contains one or more associated opcodes. In other words, when a software program is compiled into machine or object code, each instruction such as ADD, SUB, or MULT has a corresponding opcode which indicates to the underlying hardware which arithmetic operation to invoke. Therefore, if a timing of execution of each instruction is identified, each opcode associated with each instruction also follows the timing schedule.). Furthermore, Sankaralingam teaches wherein the coarse grained reconfigurable array is controllable, according to the execution configuration (¶ [0025], “…the CGRA configuration circuit 200 instead is configured to analyze multiple dataflow instructions 204(0)-204(X) of a dataflow instruction block 206, and generate a CGRA configuration (not shown) for the CGRA 202 to provide functionality for executing the dataflow instructions 204(0)-204(X) the dataflow instruction block 206.”), to execute of the opcodes of the assembly language program according to the opcode execution schedule to perform computation of the assembly language program (¶ [0023], “Each of the instruction windows 104(0)-104(3) forwards an opcode (not shown) corresponding to each dataflow instruction, along with any operands (not shown) and instruction target fields (not shown), to the associated ALUs 108(0)-108(3), the associated registers 110(0)-110(3), or the load/store queue 112, as appropriate.”).
As per claim 18, it is a product claim comprising similar limitations to claim 2, so it is rejected for similar reasons.
As per claim 19, it is a product claim comprising similar limitations to claim 13, so it is rejected for similar reasons. Additionally, Sankaralingam teaches opcodes (¶ [0023], “Each of the instruction windows 104(0)-104(3) forwards an opcode (not shown) corresponding to each dataflow instruction, along with any operands (not shown) and instruction target fields (not shown), to the associated ALUs 108(0)-108(3), the associated registers 110(0)-110(3), or the load/store queue 112, as appropriate.” Nevertheless, one of ordinary skill in the art will recognize that any machine or object code instruction will include one or more opcodes specifying an ADD, SUB, MULT, etc. operation to be applied to the respective arguments. Therefore, identifying timing of each instruction also identifies timing of each opcode within each instruction.).
Claim(s) 9-10, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Brewer1, Sankaralingam, Kotler, and Brewer2 as applied to claims 8, 15, and 19 above, and further in view of Wang et al. (US Pub. No. 2018/0081834 hereinafter Wang).
As per claim 9, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the method of claim 8. Brewer1 teaches wherein each respective tile among the tiles has internal connections between tile memories and a computing logic (Fig. 9A & 9B and ¶ [0309], “…a representative HTF reconfigurable computing circuit (tile) 210A comprises at least one multiply and shift operation circuit (“MS Op”) 305, at least one Arithmetic, Logical and Bit Operation circuit (“ALB Op”) 310, a first instruction RAM 315, a second, instruction (and index) RAM 320 referred to herein as a “spoke” RAM 320, one or more tile memory circuits (or memory) 325 (illustrated as memory “0” 325A, memory “1” 325B, through memory “N” 325C, and individually and collectively referred to as memory 325 or tile memory 325).”) and the internal connections include multiplexers to control data paths (Fig. 9A & 9B and ¶ [0309], “As part of configurability of the tile 210, one or more multiplexers are typically included, illustrated as input multiplexer 355, output multiplexer 395, and one or more intermediate multiplexer(s) 365 for selection of the inputs to the MS Op 305 and the ALB Op 310.”).
Brewer1, Sankaralingam, Kotler, and Brewer2 fail to teach setting bits to control the multiplexers.
However, Wang teaches the connectivity controls include setting bits to control the multiplexers to implement the data flows (¶ [0058], “In use, select bits of each of the multiplexers 317 may represent a part of a particular bit pattern, and by setting values of all the bits in the pattern, the corresponding multiplexers 317 may together serve to make data connections between any pairs of terminals of the configurable hardware units 302.”).
Brewer1, Sankaralingam, Kotler, Brewer2, and Wang are all considered to be analogous to the claimed invention because they are all in the same field of configuring reconfigurable computing elements. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Brewer1, Sankaralingam, Kotler, and Brewer2 with the teachings of Wang to realize a method for configuring a coarse grained reconfigurable array that includes setting bits to control multiplexers to implement data flows. The motivation to modify Brewer1, Sankaralingam, Kotler, and Brewer2 with the teachings of Wang is that being able to control the multiplexers by setting various combinations of bits allows for precise configuration of the computing hardware for each instruction of the assembly language program.
As per claim 10, Brewer1, Sankaralingam, Kotler, Brewer2, and Wang teach the method of claim 9. Brewer1 teaches the assembly language program (¶ [0605], “Source code further may be compiled into some form of instructions or object code (including assembly language instructions or configuration information).”). Sankaralingam also teaches controlling, according to the execution configuration, the coarse grained reconfigurable array during execution of the instructions of the assembly language program according to the instruction execution schedule (¶ [0021], “Each dataflow instruction explicitly encodes information regarding producer/consumer relationships between itself and other dataflow instructions within the dataflow instruction block. The dataflow instructions are executed in an order determined by the availability of input operands (i.e., a dataflow instruction is allowed to execute as soon as all of its input operands are available, regardless of the program order of the dataflow instruction).” ¶ [0025], “…the CGRA configuration circuit 200 instead is configured to analyze multiple dataflow instructions 204(0)-204(X) of a dataflow instruction block 206, and generate a CGRA configuration (not shown) for the CGRA 202 to provide functionality for executing the dataflow instructions 204(0)-204(X) the dataflow instruction block 206.”).
As per claim 16, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the machine of claim 15. Sankaralingam teaches identify, based on the tracing, connectivity controls of the tiles for data flowing within the tiles according to the instruction execution schedule (¶ [0040], “The instruction decoding circuit 234 of the CGRA configuration circuit 200 thus generates the switch control configuration 224(0) of the switch 212(0) of the mapped tile 208(0) to route an output 408 to the switch 212(1) of the destination tile 208(1), and generates the switch control configuration 224(1) of the switch 212(1) to receive the output 408 as input.”). Brewer1 teaches wherein each respective tile among the tiles has internal connections between tile memories and a computing logic (Fig. 9A & 9B and ¶ [0309], “…a representative HTF reconfigurable computing circuit (tile) 210A comprises at least one multiply and shift operation circuit (“MS Op”) 305, at least one Arithmetic, Logical and Bit Operation circuit (“ALB Op”) 310, a first instruction RAM 315, a second, instruction (and index) RAM 320 referred to herein as a “spoke” RAM 320, one or more tile memory circuits (or memory) 325 (illustrated as memory “0” 325A, memory “1” 325B, through memory “N” 325C, and individually and collectively referred to as memory 325 or tile memory 325).”) and the internal connections include multiplexers to control data paths (Fig. 9A & 9B and ¶ [0309], “As part of configurability of the tile 210, one or more multiplexers are typically included, illustrated as input multiplexer 355, output multiplexer 395, and one or more intermediate multiplexer(s) 365 for selection of the inputs to the MS Op 305 and the ALB Op 310.”).
Brewer1, Sankaralingam, Kotler, and Brewer2 fail to teach setting bits to control the multiplexers.
However, Wang teaches the connectivity controls include setting bits to control the multiplexers to implement the data flows (¶ [0058], “In use, select bits of each of the multiplexers 317 may represent a part of a particular bit pattern, and by setting values of all the bits in the pattern, the corresponding multiplexers 317 may together serve to make data connections between any pairs of terminals of the configurable hardware units 302.”).
Refer to claim 9 for the motivation to combine.
As per claim 20, Brewer1, Sankaralingam, Kotler, and Brewer2 teach the product of claim 19. Brewer1 teaches identifying, based on dispatch interface information specified in the assembly language program to identify inputs to first memory locations (¶ [0300], “The HTF dispatch interface 225 receives a data packet containing work to be performed by one or more of the tiles 210…The work descriptor packet will have one or more arguments, which the HTF dispatch interface 225 will then provide or distribute to the various tiles, as a packet or message (AF message) transmitted through the AF switches 260, to the selected, addressed tiles 210, and further, will typically include an identification of a region in tile memory 325 to store the data (argument(s)), and a thread identifier (“ID”)…”), operating controls of the dispatch interface of the coarse grained reconfigurable array to store the inputs to tile memories identified to implement the first memory locations (¶ [0305], “The HTF dispatch interface 225 will create various AF data messages for transmission over the asynchronous packet network 265 to the tiles 210, including to write data into memories 325, which tile 210 will be the base tile 210 (a base tile ID, for transmission of an AF completion message), a thread ID (thread identifier or “TID”), and will send a continuation message to the base tile 210 (e.g., with completion and other counts for each TID), so that the base tile 210 can commence execution once it has received sufficient completion messages. The HTF dispatch interface 225 maintains various tables in registers 475 to track what has been transmitted to which tile 210, per thread ID and XID.”), identifying, based on memory interface information specified in the assembly language program to identify memory access operations at second memory locations (¶ [0303], “The memory interface 215 allows the tiles 210 within a HTF circuit cluster 205 to make requests to the system memory 125, such as DRAM memory. The memory request types supported by the memory interface 215 are loads, stores and atomics.”), operating controls of the memory interfaces of the coarse grained reconfigurable array to store or retrieve data at or from tile memories identified to implement the second memory locations (¶ [0374], “The memory interface 215 sends an AF message (556) to a tile 210G within the second synchronous domain 538, step 558. The AF message contains the value for variable A and the value is written to tile memory using a parameter from the request table stored in registers 485.”), and wherein each respective tile among the tiles has internal connections between tile memories and a computing logic (Fig. 9A & 9B and ¶ [0309], “…a representative HTF reconfigurable computing circuit (tile) 210A comprises at least one multiply and shift operation circuit (“MS Op”) 305, at least one Arithmetic, Logical and Bit Operation circuit (“ALB Op”) 310, a first instruction RAM 315, a second, instruction (and index) RAM 320 referred to herein as a “spoke” RAM 320, one or more tile memory circuits (or memory) 325 (illustrated as memory “0” 325A, memory “1” 325B, through memory “N” 325C, and individually and collectively referred to as memory 325 or tile memory 325).”), the internal connections include multiplexers to control data paths (Fig. 9A & 9B and ¶ [0309], “As part of configurability of the tile 210, one or more multiplexers are typically included, illustrated as input multiplexer 355, output multiplexer 395, and one or more intermediate multiplexer(s) 365 for selection of the inputs to the MS Op 305 and the ALB Op 310.”).
Brewer1, Sankaralingam, Kotler, and Brewer2 fail to teach setting bits to control the multiplexers.
However, Wang teaches the connectivity controls include setting bits to control the multiplexers to implement the data flows (¶ [0058], “In use, select bits of each of the multiplexers 317 may represent a part of a particular bit pattern, and by setting values of all the bits in the pattern, the corresponding multiplexers 317 may together serve to make data connections between any pairs of terminals of the configurable hardware units 302.”).
Refer to claim 9 for the motivation to combine.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Applicant has amended the claims with new limitations that change the scope of the claimed invention. Therefore, the amended claims necessitate new rejections, as addressed above. The amended claims are not allowable over prior art cited previously along with an additional reference, necessitated by amendment, for reasons indicated above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN ROBERT DAKITA EWALD whose telephone number is (703)756-1845. The examiner can normally be reached Monday-Friday: 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached at (571)272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197
(toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.D.E./Examiner, Art Unit 2199
/LEWIS A BULLOCK JR/Supervisory Patent Examiner, Art Unit 2199