DETAILED ACTION
Claims 1-24 have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of prior-filed applications under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged.
Information Disclosure Statement
Per MPEP 609.02(I) and (II)(A)(2), the examiner of a continuing application will consider information which has been considered by the Office in the parent application. Therefore, information considered in parent applications 17/562,003 and 17/465,949 has been considered during examination of the instant application. However, if applicant wants said considered information to be printed on any patent resulting from the instant application, applicant must ensure that said information appears on either an IDS or an 892 in the instant application.
The examiner notes that applicant has cited NPL without providing relevant page numbers. While the NPL has been considered, please cite relevant page numbers for any NPL cited in the future, as required by 37 CFR 1.98(b)(5), to ensure consideration thereof.
Specification
The title of the invention is not sufficiently descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
In paragraphs 1-3, applicant lists related applications. Patent numbers must be inserted for any application that has resulted in a patent. While none of the applications appear to have been patented at the time of drafting this Office Action, this note will serve as a reminder, until allowance of this application, to insert patent numbers as related applications issue.
The disclosure is objected to because of the following informalities:
In paragraph 27, 3rd to last line, replace “alignthe” with --align the--.
Appropriate correction is required.
Claim Objections/Recommendations
Claim 1 is objected to because of the following informalities:
In line 1, replace “processing comprising:” with --processing, the method comprising:-- so that the steps are explicitly tied to the method and not potentially to the parallel processing.
In claim 1, line 6, the examiner recommends inserting --compressed-- before “control words” for consistency.
In claim 17, line 1, the examiner recommends inserting --compressed-- before “control words” for consistency.
In claim 23, line 8, the examiner recommends inserting --compressed-- before “control words” for consistency.
Claim 24 is objected to because of the following informalities:
In line 2, insert --and-- after the semicolon.
In claim 24, line 9, the examiner recommends inserting --compressed-- before “control words” for consistency.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 8-16, 18-19, and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Mykland (US 7,840,777, as cited by applicant), in view of Zhu et al. (US 9,367,347).
Referring to claim 1, Mykland has taught a processor-implemented method for parallel processing comprising:
accessing an array of compute elements (FIG.1, reconfigurable logic unit (RLU) 8, which includes 8 rows and 16 columns of elements (FIG.M and column 27, lines 37-41)), wherein each compute element within the array of compute elements is known to a compiler (see the abstract, last sentence) and is coupled to its neighboring compute elements within the array of compute elements (column 27, lines 42-43);
generating a plurality of compressed control words by the compiler (FIG.8), wherein the plurality of control words enables compute element operation (the array elements perform operations) and compute element memory access (FIG.1 shows the RLU coupled to data cache and buffer memory. Also, column 42, lines 49-56, refer to RLU storage/memory, which would be accessed to obtain/store data), and wherein the plurality of compressed control words is operationally sequenced (all programs include an order, i.e., an operationally sequenced order. Such is the nature of programming. See FIG.W for an example of operation order);
loading the plurality of compressed control words into a control word cache coupled to the array of compute elements (see FIG.1, cache 28, which holds control words that are ultimately decompressed by decompression circuit 20 on their way to the RLU);
Mykland has not taught linking the plurality of compressed control words, wherein linking information is contained in at least one field of each of the plurality of compressed control words. However, Zhu has taught creating command chains for execution, where each command in a chain has a field that points to the next command in the chain (such that the commands are linked) (e.g. see FIG.9, where each command include a next pointer field (“Nxt_ptr”)). This allows for a single memory to be used to store multiple chains while allowing for efficient prioritization and out-of-order execution (see column 3, lines 36-46, and claim 1 of Zhu). As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mykland for linking the plurality of compressed control words, wherein linking information is contained in at least one field of each of the plurality of compressed control words.
Mykland, as modified, has not taught that the linking is performed by the compiler. However, one of ordinary skill in the art recognizes that the linking information is part of the instruction encoding, i.e., bits of a particular field of the instruction are set in a manner to point to the next instruction. Since a compiler generates instruction encodings, it would have been obvious to one of ordinary skill in the art to have the compiler of Mykland populate the fields with linking information to create chains at compile time. This eliminates any need to do this dynamically at run-time, thereby simplifying run-time processing and hardware. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Mykland such that the linking is performed by the compiler.
Mykland, as modified, has taught wherein the plurality of compressed control words is loaded into the control word cache in an operationally non-sequenced order (see Zhu, FIG.9, for instance, where the control words are not ordered in sequenced order. For instance, one chain contains commands 0, 1, 9, 10, and 11. However, they are not stored in this order. Instead, commands 1 and 9 are separated by commands 2-8. Thus, they are in an operationally non-sequenced order, which is the order they would be stored at in control words cache 28 in Mykland in the combination.); and
ordering the plurality of compressed control words into an operationally sequenced execution order, based on the linking information (per FIG.9 of Zhu, even though the instructions are stored out of sequence order, a chain is executed in sequence order using the linking information. For instance, even though command 9 is not stored after command 1 despite being executed after command 1, command 9 will be ordered such that it is executed after command 1).
Referring to claim 2, Mykland, as modified, has taught the method of claim 1 further comprising decompressing the plurality of compressed control words (see FIG.1, and note that the program (control words) are send to decompression circuit 20 for decompression).
Referring to claim 3, Mykland, as modified, has taught the method of claim 2 further comprising executing operations within the array of compute elements using the plurality of compressed control words that were decompressed (from FIG.1, after decompression, the control words are sent through various logic/buffers to get to the array (RLU) 8, which executes the control words).
Referring to claim 4, Mykland, as modified, has taught the method of claim 2 wherein the decompressing operates on compressed control words that were ordered before they are presented to the array of compute elements (the decompressed control words have some order while stored in memory. These control words are decompressed by circuit 20 before they are sent to the array).
Referring to claim 5, Mykland, as modified, has taught the method of claim 1 further comprising aligning the plurality of compressed control words (to decompress a compressed control word, it must be aligned with a decompression engine (FIG.1)).
Referring to claim 8, Mykland, as modified, has taught the method of claim 1 wherein the ordering is performed on the plurality of compressed control words that were loaded from the control word cache (again, the stored compressed control words will be ordered in execution order instead of stored order).
Referring to claim 9, Mykland, as modified, has taught the method of claim 8 wherein the plurality of compressed control words is loaded into the control word cache using a fixed frame format (each compressed control word has a fixed field that points to a next instruction. This is at least part of a fixed frame format. Alternatively, cache 28 in FIG.1 is fixed (not part of the reconfigurable unit 8. Thus, it includes a fixed frame format to store control word data therein).
Referring to claim 10, Mykland, as modified, has taught the method of claim 9 wherein the fixed frame format encompasses the plurality of compressed control words (again, the format includes a “next” field, which is part of the compressed control word in the combination. Thus, the format encompasses the control words. Alternatively, the cache’s format encompasses the words by storing them).
Referring to claim 11, Mykland, as modified, has taught the method of claim 10 wherein the fixed frame format includes unused space between at least two of the plurality of compressed control words (see Zhu, FIG.9, which shows that there may be unused space (null entry 8) between other control words, which are compressed in the combination).
Referring to claim 12, Mykland, as modified, has taught the method of claim 1 wherein the linking information enables bin packing in the control word cache (a bin may be interpreted as a command stream in Zhu. Thus, by connecting various control words together, these control words are packed into cache as part of a respective bin/stream. For instance, in FIG.9 of Shu, commands 0, 1, 9, 10, and 11, are packed together in a bin corresponding to id0 (802)).
Referring to claim 13, Mykland, as modified, has taught the method of claim 1 further comprising an autonomous operation buffer in at least one of the compute elements of the array of compute elements (see FIG.1, and note the buffers in RLU 8. These hold data provided to the RLU or stored from the RLU (column 18, lines 53-65. Since functional units access these buffers, the buffers may be considered part of at least one compute element)).
Referring to claim 14, Mykland, as modified, has taught the method of claim 13 further comprising a compute element operation counter coupled to the autonomous operation buffer (a logic element within the RLU (column 23, lines 18-21), is shown in FIG.D as including iterators D9, which are operation counters). Everything in the system is coupled).
Referring to claim 15, Mykland, as modified, has taught the method of claim 14 wherein the autonomous operation buffer and the compute element operation counter enable compute element operation execution (all of the RLU components enable RLU operation).
Referring to claim 16, Mykland, as modified, has taught the method of claim 15 wherein the compute element operation execution involves operations not explicitly specified in a control word (starting with FIG.C, there are many components in the logic elements that contribute to the operation. An control word does not explicitly say what to do with every single component. For instance, control signals may be passed between components without an instruction actually having a field addressing those components specifically).
Referring to claim 18, Mykland, as modified, has taught the method of claim 1 wherein the plurality of compressed control words comprises variable length control words (column 9, lines 52-59).
Referring to claim 19, Mykland, as modified, has taught the method of claim 1 wherein the array of compute elements comprises a two-dimensional array of compute elements (FIG.M).
Claim 23 is mostly rejected for similar reasoning as claim 1. Furthermore, Mykland has taught a computer program product (program instructions (column 15, lines 63-64)) embodied in a non-transitory computer readable medium (FIG.1, RAM 16 (column 15, lines 63-64)) for parallel processing, the computer program product comprising code which causes one or more processors to perform the claimed operations (the code causes the operations of the system to occur).
Claim 24 is mostly rejected for similar reasoning as claim 1. Furthermore, Mykland has taught a computer system for parallel processing (FIG.1) comprising:
a memory which stores instructions (FIG.1, RAM 16, and column 15, lines 63-64);
one or more processors (FIG.1, at least processor 2) coupled to the memory, wherein the one or more processors, when executing the instructions which are stored, are configured to perform the claimed operations (the code causes the operations of the system to occur).
Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Mykland in view of Zhu and Kurisu (US 5,673,410).
Referring to claim 6, Mykland, as modified, has taught the method of claim 5 but has not taught wherein the aligning is accomplished using a shift register. However, Kurisu has taught reading variable-length instructions (FIG.8) from memory (FIG.1, 102) using a shift register (FIG.1, 103) that shifts instructions to the left until they are aligned at the left side and then sent for processing (FIG.9). This is applicable to Mykland which has variable-length compressed instructions (column 9, lines 52-59). In Kurisu, such a shift register allows packing of instructions more closely in memory (FIG.8 versus FIG.7), which allows for maximizing available program memory, and reading in an efficient manner for improved throughput and execution speed (abstract and column 1, lines 52-64). As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mykland such that the aligning is accomplished using a shift register. That is, before sending an instruction to the decompression engine, it must be read and parsed, and thus may be done using Kurisu’s shift register.
Referring to claim 7, Mykland, as modified, has taught the method of claim 6 further comprising bypassing the shift register for a compressed control word that is already aligned (note that once the instruction is aligned (at the left end in FIG.9), it is not shifted any further. Instead it is removed from the shift register, meaning that the additional shifting for that instruction is bypassed).
Claims 17 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Mykland in view of Zhu and the examiner’s taking of Official Notice.
Referring to claim 17, Mykland, as modified, has taught the method of claim 1 but has not taught wherein a field of a control word in the plurality of control words signifies a repeat last operation control word. However, a repeat instruction was well known in the art before applicant’s invention. Such an instruction provides an efficient way to repeat one or more instructions a specified number of times. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mykland such that a field of a control word in the plurality of control words signifies a repeat last operation control word.
Referring to claim 20, Mykland, as modified, has taught the method of claim 19 but has not taught wherein the two-dimensional array of compute elements is stacked to form a three-dimensional array. However, stacking 2D arrays to generate a known 3D hypercube architecture was well known in the art before applicant’s invention. A hypercube includes an additional dimension of elements to increase processing power and bandwidth. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Mykland such that the two-dimensional array of compute elements is stacked to form a three-dimensional array.
Referring to claim 21, Mykland, as modified, has taught the method of claim 20 wherein the three-dimensional array is physically stacked (again, from the rejection of claim 20, the examiner asserts that this is well-known and obvious).
Referring to claim 22, Mykland, as modified, has taught the method of claim 20 wherein the three-dimensional array is logically stacked (again, the examiner asserts that this is well-known and obvious. That is, a physical hypercube is not only physically stacked, but logically stacked (stacked operationally)).
Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Schiebe, 4,430,702, has taught microcode in which each instruction is a jump instruction that has a next address field pointing to the next microcode instruction.
O’Connor, 7,904,943, has taught a secure command sequence where each command has a field pointing to the next command (see FIG.2).
Hussain, 2006/0020772, has taught a compiler compressing instructions with consecutive operands and storing the compressed instructions in instruction cache.
Lin has taught “Variable-sized object packing and its applications to instruction cache design”.
Reuven has taught “Reducing the Thrashing Effect Using Bin Packing”.
Kroes has taught “Evolutionary Bin Packing for Memory-Efficient Dataflow Inference Acceleration on FPGA”.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168. The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/David J. Huisman/Primary Examiner, Art Unit 2183