DETAILED ACTION
Claims 1-16, 21-23, 25, and 29-35 have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on November 15, 2024, has been entered.
Information Disclosure Statement
In the IDS submitted on November 15, 2024, U.S. Patent document #2 (Craft) has been struck through because it was already cited by the examiner on June 24, 2022. Thus, the strike is not indicative of a lack of consideration, but merely of duplication. Craft was considered, cited, and used in a rejection.
Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Interpretation
At least one claim is identified as including non-limiting contingent limitations. “The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.” “The broadest reasonable interpretation of a system (or apparatus or product) claim having structure that performs a function, which only needs to occur if a condition precedent is met, requires structure for performing the function should the condition occur. The system claim interpretation differs from a method claim interpretation because the claimed structure must be present in the system regardless of whether the condition is met and the function is actually performed.” See MPEP 2111.04(II).
Regarding claim 1, the transmitting step (last paragraph) is only performed when the row valid field has an invalid row setting. Thus, one interpretation of the claim includes not performing the transmitting when the row valid field does not have an invalid row setting. The examiner recommends dividing the last paragraph into multiple steps. For instance, applicant could first claim detecting an invalid row setting in the row value field, and could subsequently claim transmitting an idle control word to each of the compute elements in the idle compute element row based on the detected invalid row setting in the row valid field. This would remove the contingency and require the transmission.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-16, 21-23, 25, and 29-35 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The term “wide” in claims 1 and 29-30 is a relative term which renders the claim indefinite. The term “wide” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For instance, is 32 bits wide? Is 64 bits? Reasonable ones of ordinary skill in the art may disagree on the line separating wide and non-wide.
Claims 2-16, 21-23, 25, and 31-35 are rejected due to their dependence on an indefinite claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 8-9, 15-16, 21-23, 25, 29-30, and 32-33 are rejected under 35 U.S.C. 103 as being unpatentable over Musicus, “The OKI Advanced Array Processor (AAP) - Development Software Manual, RLE Technical Report No. 539”, in view of Craft, U.S. Patent No. 5,764,994, Rus et al., U.S. Patent No. 8,694,978, and, optionally, Hanku, JP H04291659 A (a translation of which is provided herewith).
Referring to claim 1, Musicus has taught a processor-implemented method for task processing comprising:
accessing a two-dimensional (2D) array of compute elements (see p.1, 1st paragraph, and note the 16x16 AAP array with 256 processors (compute elements)), wherein each compute element within the array of compute elements is known to a compiler (see p.1, 2nd paragraph. The MicroCompiler compiles microcode specifically for the array and activity thereon. Thus, the compiler is aware of each element if the array) and is coupled to its neighboring compute elements within the array of compute elements (see section 10.4 on pages 37-38. A single compute element (PE) is shown with connections to its eight nearest neighbors in each direction. Also, see the figure between pages 40-41, which shows all elements coupled to its nearest neighbors);
providing control for the array of compute elements on a cycle-by-cycle basis (operations occur over a period of cycles during which different controls are provided. For instance, see the sample program at the beginning of section 3 on pp.11-12, where over a number of cycles, control is provided (e.g. in cycle 1, controls to compute B[31] are provided; in cycle 2, controls to compute B[32] are provided, etc.)), wherein the control is enabled by a stream of wide, variable length, microcode control words generated by the compiler (see p.1. The compiler generates up to 88-bit control words for controlling the array. The control words may be deemed the collection of used fields defined by the compiler (from p.10, some fields in the 88 bits may not be needed and thus they are not defined by the compiler). For instance, if 16 bits of the 88 bits is not needed, then the associated control word may be 72 bits. As such, the microcode control words are variable in length. And, 88 bits (or even a fewer number of bits) is deemed to be wide); and
Musicus has not taught that the control words are compressed by the compiler using lossless compression, nor decompressing the control words to enable control on a per element basis. However, Craft has taught compressing compiled variable-sized microcode to reduce the memory required for storing the microcode. They are later decompressed back to its original form, meaning the compression is lossless. See the abstract, column 2, lines 61-65, and column 7, lines 17-31. Craft states that the compression is performed by a special linker (column 3, lines 46-48), but does not specifically teach that the linker is part of the compiler. However, Rus has taught that most compilers include a linker component (column 5, lines 53-56). Integration of components into a single unit is a non-patentable routine expedient per MPEP 2144.04(V)(B). Applicant has not demonstrated criticality of this limitation. As a result, in order to reduce memory requirements without losing control data (lossy compression, on the other hand, would have the drawback of potentially losing control information, which may result in not controlling the system as intended by the user), it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Musicus such that the control words are compressed by the compiler using lossless compression, and for decompressing the control words to enable control on a per element basis (all array elements will be controlled appropriately).
Musicus, as modified, has further taught executing a compiled task on the array of compute elements, wherein the executing is based on the control words that were decompressed (again, the control words are a compiled task that control array activity to execute operations. Note, based on FIG.1 of Craft that compressed microcode is first decompressed prior to execution), wherein the compiled task includes a spatial allocation of subtasks on one or more compute elements within the array of compute elements within the array of compute elements (as stated above, the array is a SIMD array; thus, each processor is executing its own subtask based on its own data. The SIMD workload, by nature, is spatially allocated, i.e., executed across all elements), wherein the spatial allocation provides for an idle compute element row in the array of compute elements (from p.44, the LF bit, in this example, controls which elements are not idle (to set B[6] to zero), and which elements are idle (do nothing, i.e., do not set B[6] to zero). Note that a row could be idle if the entire row is idle, or if just a subset of the row is idle (any permutation of idle/operating elements in a given row is possible depending on the conditions of program execution)), and wherein all of the compute elements in the idle compute element row are controlled by row a valid field (again, all elements are controlled by LF. Thus, LF is a row valid field that indicates whether a PE in a given row is idle or not idle (valid)).
Regarding the limitation “transmitting an idle control word to each of the compute elements in the idle compute element row based on an invalid row setting in the row valid field”:
Under a broad interpretation, due to contingency, the transmission is not required (e.g. when LF = valid) and is, thus, not limiting. Thus, the claim is rejected by Musicus, Craft, and Rus for reasoning set forth above
Where the transmission is required (when the invalid row setting is present), Musicus has not taught the transmission. However, Hanku has taught an array of PEs in which an entire row may be idle (depending on operation need), and the entire row of PEs may execute a transmitting to no-operation (NOP) (idle control word) to perform nothing (i.e., be idle). See FIG.10 and paragraphs [0099] and [0101]. One of ordinary skill in the art would have recognized that a NOP is a means to implement idle behavior. Since it is a software solution, it could allow the array to not include additional hardware for clock/power gating, which could increase hardware cost. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Musicus for transmitting an idle control word to each of the compute elements in the idle compute element row based on an invalid row setting in the row valid field.
Referring to claim 2, Musicus, as modified, has taught the method of claim 1 but has not taught storing relevant portions of the decompressed control words within a cache associated with the array of compute elements. However, Craft has also taught this in FIG.1, where decompressed control words are sent to instruction cache 14 (see column 2, last line, to column 3, 1st line). A cache is beneficial for known reasons, namely to allow for fast access of instructions needed multiple times, e.g. when code is repeated, such as with a loop, which is used by Musicus (e.g. see p.1). As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Musicus for storing relevant portions of the control word within a cache associated with the array of compute elements.
Referring to claim 8, Musicus, as modified, has taught the method of claim 2 wherein the cache enables the decompressed control words to be distributed across a row of the array of compute elements (see p.1, which states “The array acts as a Single Instruction Multiple Data (SIMD) machine, with all 256 processors executing approximately the same instruction but on different data”. As such, the control words are passed to every element in the same row).
Referring to claim 9, Musicus, as modified, has taught the method of claim 8 wherein the distribution across a row of the array of compute elements is accomplished in one cycle (applicant has not claimed that this is a clock cycle. As such, the distribution of a control word occurs in one distribution cycle).
Referring to claim 15, Musicus, as modified, has taught the method of claim 1 wherein the compiled task determines an unneeded compute element within a row of compute elements within the array of compute elements (see p.44, up through the 2nd sentence after the code. Basically, with branching, some elements do something, and others do nothing, i.e., are unneeded).
Referring to claim 16, Musicus, as modified, has taught the method of claim 15 wherein the unneeded compute element is controlled by a single bit in a control word from the decompressed control words (again, see p.44, where the LF bit, in this example, controls which elements are needed (to set B[6] to zero), and which elements are not needed (do nothing, i.e., do not set B[6] to zero)).
Referring to claim 21, Musicus, as modified, has taught the method of claim 1 wherein the compiled task schedules computation in the array of compute elements (this is how execution works. A task is schedule for execution on the array. Also, see p.12, which states “Each of these operations is scheduled by the compiler to run at the fastest possible clock rate.” Scheduling is mentioned elsewhere in Musicus).
Referring to claim 22, Musicus, as modified, has taught the method of claim 21 wherein the computation includes compute element placement (this is broad and could cover loading of data (placing data) into a compute element, e.g. see p.7, 2nd paragraph), results routing (e.g. from p.7, 3rd paragraph, a carry output may be routed to another element), and computation wave-front propagation within the array of compute elements (e.g. see p.7, 1st paragraph, which explains that a sum can be computed by one element, and then routed to another element, which adds another value, and so on. This creates a wave front that moves through the array elements).
Referring to claim 23, Musicus, as modified, has taught the method of claim 1 wherein compute elements within the array of compute elements have identical functionality (again, these are SIMD elements, which means they all perform the same functionality where data matches across elements. Also, from p.1, the elements are described as single-bit processors; thus, they are identical in this regard).
Referring to claim 25, Musicus, as modified, has taught the method of claim 1 wherein the compiled task includes multiple programming loop instances circulating within the array of compute elements (from p.1, loop control is provided; the code on p.16-17 shows loops executing 14 and 15 times, respectively. Thus, loop instances circulate within the elements. Looping is discussed throughout Musicus).
Claim 29 is mostly rejected for similar reasons as claim 1. Furthermore, Musicus has taught a computer program product embodied in a non-transitory computer readable medium for task processing (all code is stored in memory so that it can be fetched/retrieved therefrom and all actions performed in the system are performed in response to code executing. The hardware alone, with no software, does nothing), the computer program product comprising code which causes one or more processors to perform operations of (the code either causes the microcontroller/processor (see the figure following page 6) to access the array, or a processor in the array may access other elements in the array, e.g. to pass data, etc. (see other rejections above, which explain how data is routed among elements))
Claim 30 is rejected for similar reasons as claim 29.
Referring to claim 32, Musicus has taught the method of claim 1 wherein the control word further includes a bunch field, wherein the bunch field specifies an element type (see the F field on pp.250-251. An operation element type is provided).
Referring to claim 33, Musicus has taught the method of claim 32 wherein the element type includes a multiply element (see pp.251 and note options 7, 8, B, D, and E, all of which perform “AND”. An AND is a multiple element for any two given bits (i.e., 0x0 = 0 AND 0 = 0; 0x1 = 0 AND 1 = 0; 1x1 = 1 AND 1 = 1).
Claims 3-4 are rejected under 35 U.S.C. 103 as being unpatentable over Musicus, Craft, Rus, Hanku (optionally), and Chakradhar et al., U.S. Patent Application Publication No. 2004/0111710 A1.
Referring to claim 3, Musicus, as modified, has taught the method of claim 2 but has not taught wherein the decompressing occurs cycle-by-cycle out of a second cache. However, Chakradhar has taught storing compressed code words in L2 cache and decompressed code words in L1 cache. Utilizing a second level of cache speeds up access to code that is not in the smaller first level of cache. As such, in order to speed up access of code, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Musicus to include another level of cache prior to the decompressor 12 such that the decompressing occurs cycle-by-cycle out of a second cache.
Referring to claim 4, Musicus, as modified, has taught the method of claim 3 wherein decompressing of a single control word occurs over multiple cycles (see Craft, column 5, lines 27-29. At least one control word takes multiple cycles to decompress, and, this, the processor must be delayed).
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Musicus in view of Craft, Rus, Hanku (optionally), Chakradhar, and the examiner’s taking of Official Notice.
Referring to claim 5, Musicus, as modified, has taught the method of claim 4, wherein the control words that were decompressed are of variable length (again, from the rejection of claim 1, the control words are variable length (e.g. after decompression, some 88-bit commands will have different undefined fields than others, or no undefined fields at all) but has not taught wherein the multiple cycles accommodate control word straddle over a cache line fetch boundary. Instead, Craft aligns each microcode segment to a fresh word boundary and admits that this wastes space (see column 7, lines 24-29). However, Official Notice is taken that it was well known in the art before applicant’s invention to not align cache data so strictly so as to consume cache space more efficiently (less unused space), thereby allowing more data to be cached. As a result, to improve storage efficiency, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Musicus such that the multiple cycles accommodate control word straddle over a cache line fetch boundary.
Claims 6-7, 10-14, and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Musicus in view of Craft, Rus, and Matoba et al., U.S. Patent No. 5,594,884.
Referring to claim 6, Musicus, as modified, has taught the method of claim 2 but has not taught wherein the cache comprises a dual read, single write (2R1W) cache. However, Matoba has taught such a cache. See FIG.1, cache 300, with two read ports (“PORT 1” and “PORT 2”) (see the abstract) and a single write port connected to bus 31 (see column 10, lines 37-42). As stated in the abstract, two read ports would allow for simultaneous fetching and execution of the branch path and contiguous path in response to a conditional branch instruction (see column 11, lines 14-22). This enhances performance of branch-related processing in systems with conditional branches (Musicus includes conditional branching on p.1, 1st paragraph). As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Musicus such that the cache comprises a dual read, single write (2R1W) cache.
Referring to claim 7, Musicus, as modified, has taught the method of claim 6 wherein the 2R1W cache supports simultaneous fetch of potential branch paths for the compiled task (see the rejection of claim 6).
Claim 10 is rejected for similar reasoning set forth in the rejection of claim 6.
Referring to claim 11, Musicus, as modified, has taught the method of claim 10 wherein the two or more potential compiled task outcomes comprise a computation result (the two outcomes are the two branch paths, each with their own required processing that produces a respective computation result).
Referring to claim 12, Musicus, as modified, has taught the method of claim 10 wherein the two or more potential compiled outcomes are controlled by the same control word (see p.14, 1st paragraph. A conditional jump, which has two outcomes (taken or not taken path) is controlled based on the microcode given in the first line. That is, as B[9] is specified by the first control word (note, from p.12, 1st paragraph, that the semicolon separates control words), this control word controls the branch outcomes (as each depends on a different setting of B[9]). Other such control words are shown in section 10.6).
Referring to claim 13, Musicus, as modified, has taught the method of claim 12 wherein the same control word is executed in a given cycle across the array of compute elements (see p.1, which states “The array acts as a Single Instruction Multiple Data (SIMD) machine, with all 256 processors executing approximately the same instruction but on different data”. As such, the same control word is executed at the same time across the array).
Referring to claim 14, Musicus, as modified, has taught the method of claim 13 wherein the two or more potential compiled outcomes are executed on neighboring compute elements within the array of compute elements (see section 10.6, which explains that, if data being set in a particular manner dictates whether you execute the branch path or the contiguous path, then those processors that have the data set in said manner will execute the branch path, and those that do not have the data set in said manner will execute the contiguous path. The processors are all separate from one another. Hence, multiple outcomes are executed on neighboring processors (either nearest or remote, per applicant’s paragraph [0054])).
Referring to claim 31, Musicus, as modified, has taught the method of claim 10, wherein the two or more potential compiled task outcomes comprise a routing control (again, all tasks are compiled into control words as shown on pp.247-252. Each control word includes routing control in at least the IO/1/2 field, which tells each compute element how to route data), and wherein the routing control is based on choosing a shortest communication path between two compute elements within the array of compute elements (note that all possible directions for routing are based on nearest neighbor communication, i.e., shortest communication path. In other words, routing control controls routing from each PE to each neighboring PE in either the horizontal, vertical, or diagonal direction).
Claims 34-35 are rejected under 35 U.S.C. 103 as being unpatentable over Musicus in view of Craft, Rus, Hanku (optionally), and the examiner’s taking of Official Notice.
Referring to claim 34, Musicus, as modified, has taught the method of claim 1 but has not taught wherein the providing control comprises data retiring, and wherein the data retiring is based on latency. However, Official Notice is taken that write-back cache was well known in the art before applicant’s invention. A cache has known advantages such as storing data used for processing so that it is more quickly accessible when needed in the future. Compared to main memory access, cache access is faster. However, a cache has finite space and data in a cache may be replaced to accommodate new data that is more recently used. In a write-back cache, when data is replace, it is retired, i.e., written to main memory, so that main memory holds the copy of the data. This ensures that data is only written to main memory if it is replaced and not every single time the cache is written, as is the case with a write-through cache, which may needlessly write to main memory if data written is overwritten in the cache before being replaced by other new data from main memory. The writing back to main memory for a write-back cache is retiring of data, and the retiring is based on latency. That is, because it takes too long to access current data (because it is in main memory and not cache), the current data will replace older data in the cache. That older data is then written back to main memory. As a result, in order to realize benefits of write-back caching, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Musicus to include a write-back cache, wherein the providing control comprises data retiring, and wherein the data retiring is based on latency.
Referring to claim 35, Musicus has taught the method of claim 21 wherein scheduling of the computation is based on processing speed (again, see p.12, where the compiler schedules operations to run at the fastest possible speed), and wherein computation includes a computation wavefront (the computation is a wavefront as evidenced by the architecture. Basically, processing occurs in SIMD fashion in all elements in a row, and then results would be passed to the next row below it for instance, and so on, forming a wave of data that cascades through the elements in the array. For instance, see physical page 43 where each PE is shown to output data/results to a subsequent PE), and wherein the computation wavefront includes data that is temporarily stored within a ringbus register (various registers exist to register data as it flows through the array. From physical page 43, there are registers RS, DIO, C, and multiple register banks. There are also registers (e.g. MDR and MAR) outside of the array that are attached to a ring bus (see physical page 18))).
Musicus has not taught wherein scheduling of the computation is also based on power consumption and heat dissipation. However, Official Notice is taken that design and operation based on power consumption and heat dissipation was well known in the art before applicant’s invention. These two characteristics are known considerations in operation so as to attempt to limit power consumption (and the expense related therewith), and to prevent overheating, etc. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Musicus such that scheduling of the computation is also based on power consumption and heat dissipation.
Response to Arguments
On pages 9-10 of applicant’s response, applicant argues that Musicus and the other prior art do not show a row valid field.
The examiner respectfully disagrees for reasons set forth in the rejections. At least the LF field is a row valid field because it indicates which elements are valid/invalid in a given row.
On page 11 of applicant’s response, applicant argues that the prior art has not taught claim 5 as amended.
The examiner respectfully disagrees for reasons set forth in the rejection. In particular, the newly added language is not distinct for reasons already set forth in the rejection of claim 1, which asserts that Musicus in combination with Craft has taught compressing variable length control words. Thus, because variable-length control words are inputted into a compressor, decompressing then results in the variable-length control words.
On page 11 of applicant’s response, applicant argues that the Official Notice should not apply in light of the amendments to claim 5, because that which the examiner has asserted is well known does not infer or imply variable length control words.
The examiner asserts that the Official Notice does not apply to the variable length, which has already been taught by Musicus. The examiner is merely asserting that once the system has the compressed data, it would be well known to store it such that a control word crosses over a cache line boundary.
Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Morton, EP 0223690, has taught an array of processing elements where horizontal and vertical masks are used to enable/disable individual PEs, including entire rows/columns (e.g. see FIG.8).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168. The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/David J. Huisman/Primary Examiner, Art Unit 2183