DETAILED ACTION
Claims 1-23 have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed applications under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged.
Information Disclosure Statement
Per MPEP 609.02(I) and (II)(A)(2), the examiner of a continuing application will consider information which has been considered by the Office in the parent application. Therefore, information considered in parent application 17/526,003 and grandparent application 17/465,949 has been considered during examination of the instant application. However, if applicant wants said considered information to be printed on any patent resulting from the instant application, applicant must ensure that said information appears on either an IDS or an 892 in the instant application.
Specification
The title of the invention is not sufficiently descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The title refers to load buffers but the claims include cache language instead of load buffer language.
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
In paragraphs 1-3, applicant lists related applications. Patent numbers must be inserted for any application that has resulted in a patent. While none of the applications appear to have been patented at the time of drafting this Office Action, this note will serve as a reminder, until allowance of this application, to insert patent numbers as related applications issue.
Drawings
The drawings have not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the drawings.
Claim Objections
Claim 1 is objected to because of the following informalities:
In line 1, replace “processing comprising:” with --processing, the method comprising:-- so that the steps are explicitly tied to the method and not potentially to the parallel processing.
Claim 17 is objected to because of the following informalities:
Insert --cache bank-- after “L2”.
Replace “includes” with --include--.
Claim 23 is objected to because of the following informalities:
Insert --and-- at the end of line 2.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 9-10 and 18-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Referring to claim 9, the term “wide” is a relative term which renders the claims indefinite. The term is not defined by the claims, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Specifically, the examiner is not clear on where the line is drawn between wide and not wide. For purposes of prior art examination, anything larger than 1 bit will be deemed wide.
The claims recite the following limitations for which there is a lack of antecedent basis:
In claim 18, “the age counter”. From claim 17, there may be multiple age counters. So, which of potentially multiple is applicant referring to?
In claim 19, “the L1/L2 cache bank”. From claim 16, there may be multiple L1/L2 cache banks (e.g. one in the first cache and one in the second cache). Which is being referred to?
Claims 10 and 20 are rejected due to their dependence on an indefinite claim.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-4, 16, and 21-23 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang et al. (US 2015/0120998).
Referring to claim 1, Wang has taught a processor-implemented method for parallel processing comprising:
accessing a two-dimensional array of compute elements (FIG.2, 4x4 array of nodes having cores), wherein each compute element within the array of compute elements is known to a compiler (e.g. see paragraphs 24-26, where Wang executes compiled code and, thus a compiler knows the elements to compile for) and is coupled to its neighboring compute elements within the array of compute elements (FIG.2 and note that each node is coupled to its neighbors);
coupling a first data cache to the array of compute elements, wherein the first data cache enables loading data to a first portion of the array of compute elements (see paragraphs 27-28. A number of nodes can be grouped, and a given group is coupled to last level cache slices, which together form a first data cache that is shared amongst the given group), and wherein the first data cache supports an address space (one or more addresses of the data stored by this cache form the address space);
coupling a second data cache to the array of compute elements, wherein the second data cache enables loading data to a second portion of the array of compute elements (again, see paragraphs 27-28. A second group of nodes is coupled to last level cache slices, which together form a second data cache that is shared amongst the second group), and wherein the second data cache supports the address space (see paragraphs 50-51 and note that coherency messages are sent between different caches of different groups, e.g. if one group writes to an address in its cache, the same address is another group’s cache would be invalidated. Also, from paragraph 58, if one cache doesn’t include the data, another group’s cache can be checked for that data. This means that the caches are tracking the same address space); and
executing instructions within the array of compute elements, wherein instructions executed within the first portion of the array of compute elements use data loaded from the first data cache, and wherein instructions executed within the second portion of the array of compute elements use data loaded from the second data cache (again the caches provide data to their respective groups when needed for instruction execution).
Referring to claim 2, Wang has taught the method of claim 1 wherein the address space is a common address space supported simultaneously by both the first data cache and the second data cache (again, fill/invalidate requests are sent to another cache because the address space is the same, e.g. if group 1 needs data from address X, but that data is not in (or up to date) in the group 1 cache, then address X is accessed in another cache to obtain the data).
Referring to claim 3, Wang has taught the method of claim 1 further comprising maintaining coherence between the first data cache and the second data cache (see paragraphs 50-51 and 58, among others).
Referring to claim 4, Wang has taught the method of claim 3 wherein the coherence is maintained by storing store data from within the array of compute elements to both the first data cache and the second data cache (data is stored in both caches by the respective groups which allows for coherency to be maintained (if the caches are empty there is no need for coherency). In other words, one could say that storing data to a cache is a first step in maintaining coherency).
Referring to claim 16, Wang has taught the method of claim 3 wherein the first data cache and the second data cache each comprise an (from FIG.2, an L1 cache would be 320, and an L2 cache bank would be 325 (or the collection of 325s across nodes in the same group). Note that the claimed ‘/’ is interpreted as “or”, meaning an L1 cache bank is not required by the claim).
Referring to claim 21, Wang has taught the method of claim 3 wherein the first data cache and the second data cache each includes dedicated load buffers (the first and second LLC caches can be said to include dedicated load buffers, i.e., the private caches 320 that buffer data to/from LLC caches), crossbar switches (see FIG.2, the matrix interconnect is a crossbar connecting all caches), and access buffers (LLC caches include a buffer to cache data that is accessed).
Claims 22-23 are rejected for similar reasoning as claim 1. Also note that Wang’s paragraph 86 sets forth the claimed non-transitory medium (memory) storing instructions for carrying out the claimed functionality.
Claim Rejections - 35 USC § 102/103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 6 is rejected under 35 U.S.C. 102(a)(1) as anticipated by Wang or, in the alternative, under 35 U.S.C. 103 as obvious over Wang in view of the examiner’s taking of Official Notice OR Cherukuri et al. (US 2011/0145506).
Referring to claim 6, Wang has taught the method of claim 4. With respect to the limitation wherein the store data is tagged with precedence information:
Under a first interpretation, where the precedence information is a store address (to which
the store data is to be written), Wang has taught such tagging/association. This is a 102 rejection.
Under a second interpretation, where the precedence information is not a store address, Wang has not taught the aforementioned limitation. However, Official Notice is taken that tagging cached data with an age counter (precedence information) was well known in the art before applicant’s invention. Such allows the cache to track which data is the least recently used/accessed, such that when the cache needs to replace data, data can be selected according to the known least recently used replacement algorithm, which works to ensure that only the most recently accessed data is cached, thereby improving performance for future cache accesses.
Alternatively, Cherukuri has taught compiler-determined attribute information that is included in a cache line to indicate the criticality of the data of that line. The criticality is then used in determining when to replace a cache line (e.g. paragraphs 13, 18, 25, 38). This could be useful in keeping important data in the cache even if it hasn’t been used in some time.
As a result, for either reason above, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang such that the store data is tagged with precedence information.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 5, 11, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of the examiner’s taking of Official Notice.
Referring to claim 5, Wang has taught the method of claim 4 but has not taught wherein the store data is stored to the first data cache and the second data cache in parallel. However, Official Notice is taken that parallel execution by multiple different processing elements was well known in the art before applicant’s invention. Allowing multiple groups to execute any instructions at the same time, including those that store data to cache, would increase parallelism and throughput. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang such that the store data is stored to the first data cache and the second data cache in parallel.
Referring to claim 11, Wang has taught the method of claim 6 (under the first interpretation set forth in the rejection of claim 6), but has not taught wherein the precedence information enables hazard detection. However, Official Notice is taken that detecting a hazard (address conflict) between memory operations was well known in the art before applicant’s invention. That is, it is known to ensure that a preceding store finishes in order with respect to a subsequent load to the same address so as to not experience a RAW hazard (i.e., the load reading the data location before it is written by the store, thereby loading the wrong data). This ensures data correctness. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang such that the precedence information (store address) enables hazard detection.
Referring to claim 17, Wang has taught the method of claim 16 but has not taught wherein cache lines in each L2 of the first data cache and the second data cache includes an age counter. However, this is obvious for similar reasoning given in the rejection of 6 that takes Official Notice.
Referring to claim 18, Wang, as modified, has taught the method of claim 17 but has not taught wherein the age counter establishes precedence for a unified L3 cache coupled to the first data cache and the second data cache. However, a shared unified L3 cache coupled to multiple L2 caches and a write-back policy were well known in the art before applicant’s invention. Adding another layer of cache gives the system one more change to retrieve data relatively quickly if the data is not in L1 or L2 cache compared to going to main memory. Write-back also stores replaced data to slower levels in the memory hierarchy so that replaced data is retained. Write-back is advantageous over write-through because fewer writes occur. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang to include a unified L3 cache (unified in that all data from L2 cache may go to the same L3 cache) and a write-back policy. With this modification, the age counters will indicate which data, when replaced, will go to L3 cache upon write-back. In other words, values with higher age counters have priority to be written back to L3 cache.
Referring to claim 19, Wang has taught the method of claim 16, but has not taught wherein the L1/L2 cache bank employs a write-back policy. However, write-back policy is obvious for reasoning given in the rejection of claim 18.
Claims 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Cherukuri.
Referring to claim 7, Wang, as modified, has taught the method of claim 6 wherein the precedence information is determined by the compiler (Cherukuri, paragraphs 18 and 38).
Referring to claim 8, Wang, as modified, has taught the method of claim 7 wherein the compiler provides control for compute elements on a cycle-by-cycle basis (a compiler generates a program that includes instructions that control the hardware over a number of consecutive cycles. This is the nature of program execution).
Referring to claim 9, Wang, as modified, has taught the method of claim 8 wherein control for the compute elements is enabled by a stream of wide control words generated by the compiler (paragraphs 16-18 discuss instructions, including opcodes, that perform numerous operations, including on a variety of different execution units. Such requires more than 1-bit instructions, i.e., wide instructions).
Referring to claim 10, Wang, as modified, has taught the method of claim 9 wherein the control words include the precedence information (from Cherukuri, the attribute information may be deemed part of the control words generated by the compiler).
Claims 12-15 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Hachmann (US 6,571,320).
Referring to claim 12, Wang has taught the method of claim 6 (under the first interpretation in the rejection of claim 6), but has not taught delaying promoting the store data. However, Hachmann has taught delaying a write to cache when there are still an outstanding read from the cache so as to avoid overwriting the data needed by the read (see column 1, line 43, to column 2, line 8). This ensures data correctness and, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang for delaying promoting the store data.
Referring to claim 13, Wang, as modified, has taught the method of claim 12 wherein the delaying avoids hazards (again, see the cited portion of Hachmann. This effectively avoids a hazard caused if the store writes data to the location before it is read).
Referring to claim 14, Wang, as modified, has taught the method of claim 13 wherein the avoiding hazards is based on a comparative precedence value (the store address is necessarily compared to any load address so as to detect a conflict (same address is read and written)).
Referring to claim 15, Wang, as modified, has taught the method of claim 13 wherein the hazards include write-after-read, read-after-write, and write-after-write conflicts (the delay discussed in Hachmann avoids WAR hazards (which would occur if a subsequent store were to write to an address before a previous load reads from that address. This type of delay does not cause WAW and RAW hazards and so the delay avoids these hazards as well).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of the examiner’s taking of Official Notice and Fong (US 6,292879).
Referring to claim 20, Wang, as modified, has taught the method of claim 19 but has not taught wherein the compiler generates a time delay to enable store coherence between the first data cache and the second data cache. However, Fong has taught a compiler that generates instructions that reference descriptors that indicate whether data coherency is required (enabled) or not required (disabled). See FIG.3, column 1, lines 38-50, and column 3, line 64, to column 4, line 7. As disclosed, coherency can be disabled to reduce bottlenecks and increase simplicity. One of ordinary skill in the art recognizes that until an instruction is executed that enables cache coherence, a time delay to enable coherence would be experienced. As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang such that the compiler generates a time delay to enable store coherence between the first data cache and the second data cache.
Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Shalf, 20140281243, has taught a 2D array of cores that include a local cache 118. L2 caches 214 are also provided, one for each subset of cores (FIG.2).
Hum, 20140189239, has taught virtual clusters (portions) of interconnected cores, where each cluster is connected to a number of cache slices (that make up a cache).
Manet, WO2011092323, has taught a cluster coupled to a D-cache and other clusters (FIG.2)
Sity, 20220164297, has taught groups of processor subunits in an array that each include a cache 230, and then another cache 210 used by all groups (FIG.2).
Jalal, 9,477,600, has taught cores coupled to dedicated L1 caches, a group L2 cache, and a fully shared L3 cache (FIG.1).
Lee, 8,949,806, an array processor where each compute element is coupled to its own data cache.
Goodman has taught the Wisconsin Multicube, where each processor in the array is coupled to a cache and a snooping cache and implements coherency.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168. The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/David J. Huisman/Primary Examiner, Art Unit 2183