DETAILED ACTION
Claims 1-24 have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The amended title of the invention is not sufficiently descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. At this point in time, the examiner recommends incorporating that a control word bit set by a compiler indicates a multicycle operation.
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
This is a reminder that patent numbers must be provided in paragraphs 2 and 3 for related applications, when they issue.
Claim Objections
Claim 14 is objected to because of the following informalities:
Replace “without” with --not--.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The claims recite the following limitations for which there is a lack of antecedent basis:
In claim 4, “the operation” because there is an operation in claim 3 and multiple operations in claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5-14, 19, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Gifford, U.S. Patent No. 4,891,787, in view of Toyama, JP 2011159226 (a translation of which is provided herewith).
Referring to claim 1, Gifford has taught a processor-implemented method for task processing comprising:
accessing a two-dimensional (2D) array of compute elements (FIGs.3A and 8, array of processing elements (PEs)), wherein each compute element within the array of compute elements is known to a compiler (FIG.12 and the description thereof. Code is compiled for the array) and is coupled to its neighboring compute elements within the array of compute elements (FIGs.3A and 8));
providing control for the array of compute elements on a cycle-by-cycle basis (either SIMD or MIMD code is provided to the array to control the PEs each cycle (e.g. see abstract)), wherein the control is enabled by a stream of control words generated by the compiler (the SIMD and MIMD code comprises control words),
Gifford has not taught wherein at least one of the control words involves an operation requiring at least one additional operation; setting a bit of the at least one control word, wherein the bit indicates a multicycle operation; and executing the at least one control word that was generated by the compiler, on at least one compute element within the array of compute elements, based on the bit. However, Toyama has taught a read-modify-write (RMW) instruction (at least part of a control word) that includes an ATMC flag/bit that indicates whether the RMW should be executed atomically as a series of inseparable processes. See FIGs.5(a)-(d) and section 2.2.1 of the translation. An atomic RMW is a known operation that requires additional operations, at least some of which are reading a memory value (read cycle), changing the read value (modify cycle), and storing the result back to the memory address from which the memory value was read (write cycle). This is useful for changing a memory item using a single instruction. Executing it atomically is also beneficial because it reduces race conditions in shared memory (for instance, before the update of a memory item is complete, another processor accesses that memory item), which means data is ensured to be consistent and accurate. This is useful in a system where the memory is shared (from column 5, lines 62-67, the individual memories of the PEs are also used by the CPU 10). As a result, in order to realize the ability to execute a RMW operation in an accurate manner on shared memory, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Toyama such that at least one of the control words involves an operation requiring at least one additional operation; setting a bit of the at least one control word, wherein the bit indicates a multicycle operation; and executing the at least one control word that was generated by the compiler, on at least one compute element within the array of compute elements, based on the bit.
Referring to claim 2, Gifford, as modified, has taught the method of claim 1 wherein the multicycle operation comprises a read-modify-write (RMW) operation (see FIGs.5(a)-(d) of Toyama).
Referring to claim 3, Gifford, as modified, has taught the method of claim 1 wherein the bit inhibits the at least one compute element from having its operation interrupted (see section 2.2.1 of the translation).
Referring to claim 5, Gifford, as modified, has taught the method of claim 1.
Under a first interpretation, Gifford has further taught setting one or more additional bits on one or more control words immediately following the at least one control word (instructions (control words) comprise various combinations of 0 and 1 bits so as to realize different instructions. From one instruction to the next, various bits will therefore be set).
Under a second interpretation where one or more additional bits are the same bits as those in the control word of claim 1, this is not explicitly taught by Gifford, as modified. However, this is a matter of duplicating the RMW instruction, i.e., performing multiple atomic RMW instructions in a row so as to change multiple memory items. In such a scenario, the subsequent control word(s) would have its/their ATMC bit flag set. Duplicating is not a patentable distinction (MPEP 2144.04(VI)(B)). As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gifford for setting one or more additional bits on one or more control words immediately following the at least one control word so as to perform multiple atomic RMWs repeatedly to change multiple data items.
Referring to claim 6, Gifford, as modified, has taught the method of claim 5 wherein the one or more additional bits continue to inhibit the at least one compute element from having its operation interrupted (again, as long as the ATMC bit is set, no interruption can occur).
Referring to claim 7, Gifford, as modified, has taught the method of claim 5 wherein the bit and the one or more additional bits enable an atomic, multi-control word operation (the consecutive ATMC bits atomic RMWs).
Referring to claim 8, Gifford, as modified, has taught the method of claim 7 wherein the atomic multi-control word operation comprises a read-modify-write (RMW) operation (again, this combination of art is an array for performing RMW operations).
Referring to claim 9, Gifford, as modified, has taught the method of claim 5 wherein an atomic duration is controlled by a number of consecutive control words having their multicycle operation bits set (if each RMW with ATMC bit set takes X time, then N consecutive RMWs will control an atomic duration of NX time).
Referring to claim 10, Gifford, as modified, has taught the method of claim 9 wherein the atomic duration enables a memory access barrier (again, the RMW is atomic, meaning the memory location being accessed by the RMW is not available to be accessed by another processor. Thus, the RMW creates a barrier that is only lifted when the RMW completes. Alternatively, a memory access barrier may be an interruption barrier that exists during memory access by RMW (i.e., the RMW cannot be interrupted)).
Referring to claim 11, Gifford, as modified, has taught the method of claim 1 wherein the operation requiring at least one additional operation is indicated in the at least one of the control words and a subsequent control word (while RMW is a single instruction, multiple control words are required, one for each of the read, modify, and write. Thus, the latter two may be deemed a subsequent control word).
Referring to claim 12, Gifford, as modified, has taught the method of claim 1 wherein the operation requiring at least one additional operation is indicated in the at least one of the control words (the RMW control word indicates additional operations such as modify and write).
Referring to claim 13, Gifford, as modified, has taught the method of claim 1 wherein successful completion of the at least one additional operation comprises an atomic operation (again, the RMW successfully completes atomically (without interruption)).
Referring to claim 14, Gifford, as modified, has taught the method of claim 10 but has not taught further disabling the access barrier upon receipt of a control word without having its multi-cycle operation bit set. However, recall from section 2.2.1 of Toyama, that atomic execution is optional, meaning any given RMW may have its ATMC bit set or clear. In addition, one of ordinary skill in the art would have recognized that any number of RMWs may be performed in a row so as to duplicate the operation where needed. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gifford to execute multiple RMWs consecutively where at least one atomic RMW is followed by one non-atomic RMW. In such a case, after the atomic RMW, a non-atomic RMW without its ATMC bit set would be received and this disables the access barrier, i.e., the inability to interrupt.
Referring to claim 19, Gifford, as modified, has taught the method of claim 1 wherein the bit comprises a control word atomic lock bit (the ATMC flag effectively locks the memory location to be read, modified, and written so that no other processor can access it until RMW completes).
Referring to claim 22, Gifford, as modified, has taught the method of claim 1 wherein the stream of control words generated by the compiler provides direct, fine-grained control of the 2D array of compute elements (the control words streamed to the PEs control the PEs. Thus, they provide for fine-grained control, e.g. each can be separately controlled in MIMD mode).
Claim 23 is mostly rejected for similar reasoning as claim 1. Gifford has further taught a computer program product embodied in a non-transitory computer readable medium for program execution, the computer program product comprising code which causes one or more processors to perform the claimed operations (Gifford has taught both the compiler, which is a software program that must be stored in non-transitory memory to be obtained and executed, and the actual code to be executed by the PEs, where this code is stored in memories 43 in FIG.3A (column 6, lines 9-11)).
Claim 24 is rejected for similar reasoning as claim 23. That is, a compiler is stored in memory, which is accessed by a processor to execute the compiler to perform at least some of the claimed steps and generate code to perform at least some of the claimed steps.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Gifford in view of Toyama, Arm Limited, “Arm Compiler Version 6.11”, Pahlawan et al. (DE 102014111305, a translation of which is provided herewith), and Yildirim et al. (2020/005058).
Referring to claim 4, Gifford, as modified, has taught the method of claim 3 but has not taught wherein the operation comprises an attempted thread swap out, wherein the attempted thread swap out is delayed until completion of one or more control words immediately following the at least one control word.
However, Arm has taught that instead of a single control word to carry out RMW, separate instructions may carry out RMW (see p.3-61). Using existing instructions that could be used in a non-RMW context means that a specific RMW instruction could be eliminated, thereby reducing the size of the instruction set. As a result, it would have first been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Gifford such that the RMW of Toyama is replaced with a read instruction, at least one modify instruction, and a write instruction.
Given this modification, there is still an issue of implementing atomicity for the RMW. Pahlawan has taught that a block of instructions each having a bit therein set to 1 is executed atomically (p.10, lines 11-15). This supports the proposed removal of explicit atomic instructions. As a result, in order to remove an extra RMA instruction while still implementing the atomicity taught by Toyama, it would have been obvious to further modify Gifford such that the individual read, modify, and write instructions that replace an RMW instruction each have a bit therein that when set, causes the read, modify, and write instructions to be performed as an atomic read-modify-write.
With this modification in mind, Yildirim has taught an attempted task thread swap to a higher priority thread, where the swap is delayed until the current atomic operation is completed (see p.6, line 35, to p.7, line 3). As is known, multi-threading allows for threads to be swapped in and out depending on various conditions, e.g. each thread is given an amount of time to operate before the next thread takes its turn. This allows a system to make progress on multiple tasks instead of only allowing one task to finish to completion before another task starts. It can also reduce stalling because if one task stalls another can be swapped in as opposed to the processor remaining stalled. Allowing a higher priority thread to preempt a current thread is beneficial to provide processing resources to the most important tasks. However, as explained, an atomic operation must be allowed to complete to ensure correct results; hence the reason it is executed atomically without interruption. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Gifford such that a thread that is executing RMW with separate instructions each marked as atomic will not be allowed to swap out until the separate instructions complete, thereby completing the atomic operation before swap. That is, when a thread swap out is attempted, the attempted thread swap out is delayed until completion of one or more control words (the modify and write instructions (control words)) immediately following the at least one control word (the read instruction (control word), which would contain a bit that indicates the start of a multicycle operation).
Claims 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Gifford in view of Toyama and Van Dyke et al., U.S. Patent No. 7,680,992.
Referring to claim 15, Gifford, as modified, has taught the method of claim 1 but has not taught enabling selective interrupt enablement based on the setting a bit. However, Van Dyke makes an exception for RMW by allowing it to be interrupted by a higher priority memory request (abstract). This allows for higher priority tasks to receive preference over lower priority tasks, thereby ensuring they get the attention needed. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gifford for enabling selective interrupt enablement based on the setting a bit.
Referring to claim 16, Gifford, as modified, has taught the method of claim 15 wherein the selective interrupt enablement is further based on setting an additional bit (from column 2, lines 63-65 of Van Dyke, bits are used to indicate a priority level of memory requests. Thus, priority bits must be set in such a manner so as to allow a higher priority request (e.g. a request having high priority of 1) to interrupt a lower priority request (e.g. RMW having priority of 0)).
Referring to claim 17, Gifford, as modified, has taught the method of claim 1 but has not taught wherein a non-maskable interrupt (NMI) overrides the setting a bit. However, for similar reasoning given above for claim 15, this is an obvious modification to Gifford, where a higher priority memory request is an NMI that is allowed to interrupt RMW, i.e., the execution of RMW cannot mask the higher-priority request.
Referring to claim 18, Gifford, as modified, has taught the method of claim 1 but has not taught wherein an arithmetic exception or a memory exception overrides the setting a bit. However, for similar reasoning given above for claim 15, this is an obvious modification to Gifford, where a higher priority memory request is a memory exception that needs to be handled now and, as such, preempts/interrupts the RMW, thereby overriding the set ATMC flag.
Claims 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Gifford in view of Toyama and the examiner’s taking of Official Notice.
Referring to claim 20, Gifford, as modified, has taught the method of claim 1 but has not taught wherein the compiler maps machine learning functionality to the array of compute elements. However, Official Notice is taken that compiling code for machine learning in an array processor environment was well known in the art before applicant’s invention. Machine learning allows a system to implement a neural network to be trained to automate tasks such as image/pattern recognition, speech/language recognition/processing, production recommendation, etc. (i.e., other known AI tasks). An array processor such as Gifford has many parallel processors which naturally accommodate the parallel workload related to large datasets for machine learning applications. As such, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Gifford such that the compiler maps machine learning functionality to the array of compute elements.
Referring to claim 21, Gifford, as modified, has taught the method of claim 20 wherein the machine learning functionality includes a neural network implementation (see the rejection of claim 20).
Response to Arguments
Based on applicant’s response to the 112 rejections, the examiner has withdrawn the 112 rejections.
On page 11 of applicant’s response, applicant argues that Gifford “does not show or suggest ‘executing the at least one control word that was generated by the compiler, on at least one compute element within the array of compute elements, based on the bit’ as recited in the currently amended claims.”
The examiner respectfully disagrees. Gifford is directed to a special compiling technique (see column 18, line 35, to column 19, line 18, FIG.12, and claim 1). A compiler compiles programs to generate instructions for execution. Thus, everything to be executed in Gifford would be generated by a compiler. When the atomic RMW operation is brought in from Toyama, it, too, would be generated by a compiler like the rest of the program code.
On page 11 of applicant’s response, applicant argues that Toyama does not teach the at least one control word generated by a compiler.
The examiner asserts that applicant is arguing the references separately instead of the combination. Gifford teaches the compiling. Any control word executed by Gifford, including that imported from Toyama, would be generated by the compiler.
Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Walsh, US 2016/0048354, has taught a field in each command indicating whether the command is part of an atomic group.
Hillier, US 2006/0090044, has taught breaking an RMW command into separate ready and write command that do not need to be executed together.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168. The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/David J. Huisman/Primary Examiner, Art Unit 2183