DETAILED ACTION
Claims 1-26 have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 26 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The claims recite the following limitations for which there is a lack of antecedent basis:
In claim 26, “the repurposing”, which could refer to the repurposing at the beginning of line 11, or to the repurposing at the beginning of line 12. The examiner recommends inserting --the-- before “repurposing” in line 12 (as was done for the other independent claims).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-6, 8-17, 19-23, and 25-26 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hauser et al., “Garp: A MIPS Processor with a Reconfigurable Coprocessor”.
Referring to claim 1, Hauser has taught a processor-implemented method for task processing, the method comprising:
accessing a two-dimensional (2D) array of compute elements (FIG.2), wherein each compute element within the array of compute elements is known to a compiler (p.18, section 3 and 3.2, and p.21, section 5) and is coupled to its neighboring compute elements within the array of compute elements (from p.13, right column, “a more conventional wire network provides interconnection within the array. Wires of various lengths run orthogonally vertically and horizontally. Vertical wires can be used to communicate between blocks in the same column, while horizontal wires can connect blocks in the same or adjacent rows.”);
pausing operation of the array of compute elements (from p.13, right column, when a clock counter reaches 0, the array is stopped. Under a second interpretation, the array is stalled/paused when a cache miss occurs (section 2.7, 3rd paragraph)), wherein the pausing occurs while a memory system continues operation (from p.13, right column the array may be paused but a copy of data to the array from a main processor (at least part of a memory system) may occur. Or, saved data state may be sent to the array via garestore instruction of Figure 11. Also, see section 2.5. Also, the array is stalled while data is being retrieved from external memory in response to a cache miss);
repurposing a bus coupling the array of compute elements, wherein the repurposing is based on an occurrence of an exception, wherein the repurposing couples one or more compute elements in the array of compute elements to the memory system, and wherein a memory system operation is enabled during the pausing (from Figure 4, the bus between the clocked register and the mux above the register carries the output (Z) of the functional unit during array operation/execution. However, when operation is paused, the same bus is repurposed to transfer items arriving on “memory bus in”. The repurposing of the bus allows the register in Figure 4 to obtain data from “memory bus in” in response to a cache miss exception (from applicant’s paragraph [0046], a cache miss is an exception)); and
transferring data from the memory system to the array of compute elements, using the bus that was repurposed (the cache miss exception causes the data to be transferred from memory to the array using the bus that is otherwise used for propagation of results generated by the compute element itself).
Referring to claim 2, Hauser has taught the method of claim 1 wherein the data from the memory system is transferred to scratchpad memory in the one or more compute elements within the 2D array (see Figure 4, where the data is transferred to clocked registers, i.e., scratchpad, which is high-speed, short-term memory that temporarily stores data).
Referring to claim 3, Hauser has taught the method of claim 2 wherein the scratchpad memory provides operand storage (from Figure 7, register data can be fed back into the functional unit, which operates on operands. Alternatively, the register data can work its way to Hout, which is connected to the functional unit in the compute element below it. Thus, depending on configuration, the register/scratchpad stored operands for different units).
Referring to claim 4, Hauser has taught the method of claim 2 further comprising tagging the data before it is transferred (from section 2.7, the address of the data is stored to a Z register to control the transfer. Thus, the data can be said to be tagged with this address. Different data that is transferred will have different addresses/tags).
Referring to claim 5, Hauser has taught the method of claim 4 wherein the tagging guides transferring of the data to a particular compute element within the array of compute elements (using the tag, the appropriate data will be transferred to a control block at the edge of the array. The control blocks control memory access (section 2.7)).
Referring to claim 6, Hauser has taught the method of claim 4 wherein the tagging comprises a target row location within the array of compute elements (from section 2.7, “[t]he data is transferred over a memory bus to/from another selected row”. Thus, the selected row must be indicated, and this may be part of the tagging).
Referring to claim 8, Hauser has taught the method of claim 1 wherein the bus continues operation during the pausing (as discussed, when array execution is paused, the bus still operates so as to receive the data on “memory bus in” and store it in a register).
Referring to claim 9, Hauser has taught the method of claim 1 further comprising resuming operation of the array of compute elements after the transferring data is complete (when data is stored to the array, the array would then be configured to process the data; otherwise, there would be no point in sending it data. The main processor sets the clock counter to the number of cycles it should operate (p.13, right column, 3rd to last paragraph)).
Referring to claim 10, Hauser has taught the method of claim 9 wherein a compiled task determines the resuming operation (the compiled program will have the various instructions in it to set the clock counter and resume array operations).
Referring to claim 11, Hauser has taught the method of claim 1 further comprising coupling load queues between the memory system and the bus (see p.17, right column, which discloses three memory queues for reading ahead. Alternatively, a clocked register in Figure 4 is basically a 1-element queue leading to the bus at its output).
Referring to claim 12, Hauser has taught the method of claim 11 wherein the data transferred from the memory system is buffered by the load queues (again, there are queue for reading/writing, which covers transferring of data in either direction. Alternatively, when the register of Figure 4 is the queue, the data is transferred to the queue from “memory bus in”, where it waits until outputted by the queue for further propagation).
Referring to claim 13, Hauser has taught the method of claim 12 wherein the load queues are emptied of the data that was buffered before a resume occurs (when a data cache miss occurs, the array is stalled. This is because the array does not have access to the needed data. The array will resume when the data is available for processing, meaning it must be out of any queue. Alternatively, the claim merely sets forth “a resume”. This resume may occur when all of the data is emptied from the buffers, whenever that may be).
Referring to claim 14, Hauser has taught the method of claim 11 wherein the load queues participate in the repurposing (when the queues/registers store data from “memory bus in” (or from anywhere while the state of the array is being updated when the clock counter is zero), they are participating in the repurposing).
Referring to claim 15, Hauser has taught the method of claim 11 wherein the load queues are notified of the pausing (from Figure 4, the selector(s) above the clocked registers may be deemed part of the load queues. When loading state from memory, they are notified, via selection signal, that the pause is in place and that the “memory bus in” input should be selected).
Referring to claim 16, Hauser has taught the method of claim 1 wherein the pausing operation is necessitated by an exception (again, see section 2.7. A cache miss causes a stall/pause. And, a cache miss is an exception needing especial handling).
Referring to claim 17, Hauser has taught the method of claim 1 wherein the pausing operation is necessitated by data congestion (when the cache (or a portion of the cache where needed data would be found) is full (congested) and doesn’t contain the needed data, a cache miss would occur, which causes the pausing).
Referring to claim 19, Hauser has taught the method of claim 1 wherein the pausing, the repurposing, and the transferring comprise a background data load (applicant is merely naming these three actions as a background load. The same three actions that occur in Hauser may be named the same).
Referring to claim 20, Hauser has taught the method of claim 1 wherein the compiler schedules computation in the array of compute elements (see section 3, 3.1, 3.5, and 5, along with Figure 12. Compilation is used to generate programs that schedule configurations and computations in the array).
Referring to claim 21, Hauser has taught the method of claim 20 wherein the computation includes compute element placement (the compiled instructions of Figure 11 perform placement of data/configurations in the array), results routing (the compiled instructions of Figure 11 such as mfga and gasave will cause results to be read from the array and routed to memory or main processor registers), and computation wave front propagation within the array of compute elements (the compiled program will dictate how waves of results in a given row/column flow/propagate through the array as a result of a particular configuration).
Referring to claim 22, Hauser has taught the method of claim 1 wherein a compiled task includes multiple programming loop instances circulating within the array of compute elements (see section 5, which sets forth mapping loop instances to the array).
Referring to claim 23, Hauser has taught the method of claim 1 wherein the array of compute elements comprises a superstatic processor architecture (from paragraph [0025] of applicant’s specification, “In a superstatic processor architecture, the placement in the array, the routing of results and routing of execution wave front propagation, and the scheduling of computation are all performed by the compiler, not by an underlying instruction-driven hardware microarchitecture at runtime.” Sections 3, 3.1, 3.5, and 5, along with Figure 12, described that compilation is used to generate programs that schedule configurations and computations in the array, which also result in compute elements propagating values to other compute elements. Thus, Hauser’s array may be deemed superstatic).
Claim 25 is mostly rejected for similar reasoning as claim 1. Further, Hauser has taught a computer program product embodied in a non-transitory computer readable medium for task processing, the computer program product comprising code which causes one or more processors to perform the claimed operations (see Figure 1, memory and/or instruction cache, which are non-transitory readable media that store a compiled program that when executed by a processor/array causes the operation of Hauser to be realized).
Claim 26 is mostly rejected for similar reasoning as claim 1. Further, Hauser has taught a computer system (Figure 1) for task processing comprising: a memory which stores instructions (Figure 1, memory and/or instruction cache); one or more processors coupled to the memory (Figure 1, standard processor and/or reconfigurable array), wherein the one or more processors, when executing the instructions which are stored, are configured to perform the claimed operations (memory/instruction cache store instructions of a compiled program that when executed by a processor/array causes the operation of Hauser to be realized ).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Hauser in view of Wikipedia, “Torus interconnect”.
Referring to claim 7, Hauser has taught the method of claim 1 but has not taught wherein the bus comprises a ring bus along a row or column of the array of compute elements. While vertical/horizontal buses are taught that allow compute elements to pass data to their neighbors, Hauser does not discuss connecting the nodes at the opposite edges to create a ring. However, Wikipedia has taught the know torus interconnect for a 2D array, that has a ring bus in each row and column. This results in higher speeds, lower latency, and lower energy consumption because data has more options to travel from one node to another. See the 2D torus in the “Visualization” section along with the listed advantages on pp.3-4. As a result, in order to increase speeds and reduce energy consumption, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Hauser such that the bus comprises a ring bus along a row or column of the array of compute elements.
Claims 18 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Hauser in view of the examiner’s taking of Official Notice.
Referring to claim 18, Hauser has taught the method of claim 17 but has not taught wherein the data congestion is due to access jitter or a data cache miss. However, Official Notice is taken that a cache (or a portion thereof) being full as a result of previous caches misses was well known in the art before applicant’s invention. Replacement data already in the cache was also well known before applicant’s invention. A cache miss causes the data retrieved from external memory to be stored in the cache for quicker future access. Such storage in the cache may replace an already-existing entry (based on cache design or replacement policy such as “least recently used” (LRU) replacement). When this happens, the cache (or portion thereof) becomes congested with data as a result of at least one previous cache miss and if that at least one previous cache miss replaced data that is needed again in the future, a cache miss will occur as a result of this particular congestion (data being where the needed data should be). Such cache operation is known to try to improve efficiency and reduced latency. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Hauser such that the data congestion occurs due to a previous cache miss.
Referring to claim 24, Hauser has taught the method of claim 1 but has not taught wherein a compiled task comprises machine learning functionality. However, machine learning convolution using an array (e.g. a systolic array) was well known in the art before applicant’s invention. Convolution is used in convolutional neural networks (CNNs) to, for instance, analyze images for particular features for machine learning. A systolic array can efficiently perform convolution, which involves dot product calculations. One of ordinary skill in the art would have recognized that Hauser’s array of vertical and horizontal connections can be used to implement a systolic array to implement image analysis to realize machine learning. As a result, it would have been obvious to one of ordinary skill in the art before applicant’s invention to modify Hauser such that a compiled task comprises machine learning functionality.
Response to Arguments
On page 11 of applicant’s response, applicant argues that Hauser has not taught that the repurposing is based on an occurrence of an exception.
The examiner respectfully disagrees. One way that the bus is repurposed in Hauser is as a result of a cache miss. This is an exception that requires special handling. Applicant’s own paragraph [0046] states “where the exception can include a data cache miss.”
On page 12 of applicant’s response, applicant argues that Wikipedia does not suggest the application of the ring bus to the repurposing.
Applicant is arguing the references separately. The examiner has given a reason why a ring bus would be useful in Hauser. For example, memory accesses are performed by control blocks at the edge of the array (left edge in FIG.2). When the data is returned from memory, it may be routed to various columns/rows to load different elements. A ring bus allows data to flow in either direction, which can speed up transfer. For instance, in FIG.2, if the top left control element has to pass data to any of the right half of elements in the top row, then passing that data via ring bus to the top right elements (and then further as necessary), without going through the left half of elements in the top row, would be faster than passing that data via unidirectional bus through the left half of elements in the top row (and then further as necessary) until the data reaches its destination. Similarly, passing data from the control element via the ring to any of the left half of elements in the top row would more efficiently flow in the opposite direction on the ring bus. Thus, the examiner asserts the applicability is apparent to one of ordinary skill in the art.
On page 12 of applicant’s response, applicant argues that the prior art does not teach that the data congestion is caused by a cache miss. Applicant further argues that Official Notice is improper since the claimed features are neither well-known nor universally conventional without undue experimentation.
The examiner has merely taken Official Notice that filling a cache in response to previous cache misses was well known in the art, as is an area of a cache becoming congested in response to filling it. Thus, the examiner asserts that the Official Notice is proper and that filling a cache such that it is congested is well known in the art.
On pages 12-13 of applicant’s response, applicant argues that the Official Notice related to machine learning is not appropriate because executing compiled machine learning tasks on reconfigurable FPGA coprocessors was not conventional at the time of Hauser’s publication, nor is it universally conventional today without specific implementation details.
The examiner respectfully disagrees. The examiner notes that Hauser’s publication date is not the important date. Per 35 U.S.C. 103, the question is whether “the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention”. The examiner maintains that the noticed feature was well known before applicant’s effective filing date.
Conclusion
The following prior art previously made of record and not relied upon is considered pertinent to applicant's disclosure:
Shao has taught a systolic array which uses the same bus to transmit results (partial sums) or preloaded weights to a next element in a systolic array (e.g. see Figures 3 and 5). When preloading, the array is not operating. This reference is deemed particularly relevant to applicant’s claims, and, thus, should be reviewed before formulating a response.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168. The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/David J. Huisman/Primary Examiner, Art Unit 2183