DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 04/10/2025 has been entered.
Response to Amendment
This office action is in response to the amendment filed on 03/11/2025. Claims 1-4, 7-8, and 11-12 are pending. Claims 1 and 8 are amended. Claims 5-6, 9-10, and 13-16 are canceled.
Response to Arguments
Applicant's arguments filed 03/11/2025 have been fully considered but they are not persuasive.
Applicant submits:
“Based on the amended claims 1 and 8, Applicant asserts that none of Cavus, Meier and Ye, and any of their combinations teach, suggest or even hint at the feature of "prefetching data to be used in a loop prior to use of the data in the loop, dynamically computing prefetch addresses in advance using a prefetcher management unit (PMU) and storing the dynamically computed prefetch addresses in the FIFO data structure", as defined in amended claims 1 and 8.
In support of this stance, Applicant asserts that Cavus discloses prefetching techniques for indirect memory access patterns using stride-based prediction and a Prefetch Request Queue (PRQ). However, the PRQ in Cavus merely manages prefetch requests and lacks any functionality for dynamically computing prefetch addresses or coordinating their storage for iterative loop-based operations. In contrast, according to amended claim 1, the FIFO data structure works alongside with a prefetcher management unit (PMU) to store and manage dynamically computed prefetch addresses, which can be a more efficient prefetching technique that can be tailored specifically to pseudo-random access patterns.
Moreover, Cavus does not teach or suggest the use of a PMU for dynamically computing prefetch addresses in advance or integrating these computations with a FIFO and LSU for loop-based pseudo-random access patterns. Cavus's reliance on stride-based and confidence threshold mechanisms fails to address the unique challenges of pseudo-random access patterns targeted by the present application.” (Remarks, page 5)
However, this argument is not persuasive because it appears to only consider the PRQ of Cavus. Cavus at sec. C par. 1 on page 11 teaches an Array Tracking Prefetcher (ATP) that calculates prefetch addresses (i.e., dynamically computing prefetch addresses using a PMU) and stores them in the PRQ (i.e., storing the dynamically computed prefetch addresses in the FIFO data structure), and Fig. 4 shows an arrow from the PCU of the ATP to the PRQ, which indicates that the prefetch addresses are computed in advance of being stored in the PRQ. Further, Cavus at pg. 7 par. 2-3 teaches prefetching data for iterations into the future, which indicates that the prefetching is for data that is used in a loop (since the term iterations indicates iterations of a loop) prior to its use in the loop (since data must first be fetched before it may be used and since the term “prefetching” indicates that the data is fetched in advance).
Applicant submits:
“Applicant asserts that Meier describes general load-store unit (LSU) operations for managing prefetched data but does not disclose, suggest or even hint at the coordination of a PMU and FIFO with the LSU, as claimed.” (Remarks, page 5)
However, this argument is not persuasive because it does not consider the combination of references or that Meier is not relied on to teach a PMU and FIFO. Meier is only relied on to teach an LSU, and the office action modifies Cavus with the LSU of Meier to teach an LSU that would load the prefetched data from cache.
Applicant submits:
“Furthermore, Applicant asserts that Ye primarily addresses toggling between hardware and software prefetching based on access pattern analysis, relying on control registers to enable or disable prefetching. While Ye may be relevant to general prefetching systems, Ye lacks any disclosure or suggestion of a PMU capable of dynamically computing prefetch addresses in advance. Moreover, Ye does not describe or suggest the use of a FIFO data structure to store computed prefetch addresses or coordinate their retrieval with an LSU.” (Remarks, page 6)
However, this argument is moot because the current rejection does not rely on Ye for any teaching or matter specifically challenged in the argument.
Claim Objections
Claims 7-8 are objected to because of the following informalities:
Claim 7- insert --FIFO-- before “data structure” to be consistent with references to the FIFO data structure
Claim 8 line 8- insert --FIFO-- before “data structures” to be consistent with references to the one or more FIFO data structures
claim 8 line 11- insert --performed-- after “are” since this term appears to have been mistakenly deleted.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-4, 7-8, and 11-12 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites “the data” in line 13. It is unclear whether this refers to the data introduced in line 4, line 8, line 10, or line 13.
Claim 4 recites “a cache line” in line 3. It is unclear whether this is the same cache line introduced in claim 1 line 10 or a different cache line.
Claim 4 recites “an address of the one or more addresses” in line 3. It is unclear whether this refers to the same address introduced in claim 1 line 10 or a different address.
Claim 8 recites “the data” in line 13. It is unclear whether this refers to the data introduced in line 4, line 7, line 9, line 10, or line 13.
Claim 8 recites “the FIFO data structure” in line 15. It is unclear which of the one or more FIFO data structures introduced in line 3 this refers to.
The dependent claims are further rejected based on their dependence from a rejected base claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 7-8, and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over “Informed Prefetching for Indirect Memory Accesses” (cited on 892 filed 12/15/2023), hereby referred to as Cavus, in view of Meier US 2015/0026413 (cited on 892 filed 12/15/2023).
Regarding claim 1, Cavus teaches:
1. A method of prefetching array segments with pseudo-random access patterns, the method comprising:
storing one or more addresses in a first-in-first-out (FIFO) data structure, each of the one or more addresses representing a memory location of data stored in an array segment (Fig. 4: the entries of the prefetch request queue PRQ are a FIFO data structure (since a queue is a first-in-first-out data structure) that stores calculated addresses from the prefetch calculation buffer PCB, see also sec C par. 1 on pg. 11: describing the PCB calculating the prefetch addresses, where the addresses represent a memory location of data stored in an array segment, see hashed array examples shown in Table 1 on pg. 5 and Table 10 pg. 25);
identifying portions of data in the array segment where hash-like accesses are performed at a predetermined frequency (sec. 2.1 describes that a compiler may identify a loop and pass extracted indirect-access information to the hardware component through ATI instructions and Table 1 on pg. 5 shows ATI instructions for a hashed index, this indicates that the ATP hardware/PMU (see Fig. 4 and corresponding description in sec. 2.2) identifies portions of an array that are accessed using hashed indices (i.e., hash-like accesses) in a loop (i.e., at the predetermined frequency of the loop) based on the ATI instructions that initialize the ATP for hashed indices);
prefetching data to a cache line using an address of the one or more addresses (Fig. 4 shows the PRQ providing the addresses to the L1 cache to prefetch data into the cache, pg. 17 par. 2 describes that the ATP is implemented to prefetch for the L1 cache and sec. 4.4 par. 1 on pg. 21 describes that prefetched cache lines that are demand accessed later which indicates that data is prefetched into a cache line using the addresses provided by the PRQ), wherein prefetching includes prefetching data to be used in a loop prior to use of the data in the loop (Cavus pg. 7 par. 2-3 teaches that data is prefetched for iterations into the future, which indicates that the data is prefetched prior to its use in a loop), dynamically computing prefetch addresses in advance using a prefetch management unit (PMU) (pg. 11 sec. C par. 2: the ATP (i.e., a PMU) computes the prefetch addresses, see also Fig. 4 showing a Prefetch Calculation Unit (PCU) of the ATP calculating prefetch addresses (described in sec. C) in advance of the PRQ) and storing the dynamically computed prefetch addresses in the FIFO data structure (Fig. 4: the arrow from the PCU to the PRQ indicates that the prefetch addresses calculated by the PCU (described in sec. C) are stored in the PRQ); and
load the prefetched data from the cache line (pg. 12 par. 1 teaches prefetching values into the cache ahead of time so that the values are found in the cache when needed and the description of accuracy and timeliness in sec. 4.4 par. 1 on pg. 21 indicates that the prefetched data will be loaded from the cache line when there is a demand access/cache hit).
Cavus does not explicitly teach:
driving input to a load-store unit (LSU) to load the prefetched data from the cache line using the address of the one or more addresses.
However, Meier teaches driving input to a load-store unit (LSU) to load data using an address ([0029]: the LSU executes a load op (i.e., the load op drives input to the LSU) which specifies a transfer of data from memory to a register using an address provided by the load).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the core of Cavus (see Fig. 4) to include an LSU that executes load instructions that drive input to the LSU as taught by Meier such that the LSU will load the prefetched data from the cache line by executing a load instruction that specifies the address of the prefetched data in the cache. One of ordinary skill in the art would have been motivated to make this modification because use of an LSU to execute load instructions is a known technique on the known device of a processor for accessing memory and would yield the predictable result of improving performance since the LSU would free up other execution resources (that would otherwise process load/store instructions in implementations without an LSU) to process other instructions (see MPEP 2143, Example D).
Regarding claim 2, Cavus in view of Meier teaches:
2. The method of claim 1, wherein storing one or more addresses comprises storing one or more hash access addresses (Cavus table 10 on pg. 25 shows that prefetching for hashed data structures is supported by ATP, which indicates that the calculated addresses in the PRQ may include hash access addresses, see also pg. 7 par. 5 describing calculating the prefetch address of A[func(B[i])] ).
Regarding claim 3, Cavus in view of Meier teaches:
3. The method of claim 1,
Cavus in view of Meier does not explicitly teach:
wherein storing one or more addresses comprises storing one or more hash offset addresses which is added to a base memory location to form a hash access address.
However, Cavus further teaches that the address of an indirect access A[B[i]] is equal to a base address plus B[i]*size(A), which is an offset address added to a base memory location (pg. 11 equation (2)), and a hash data structure A[func(i)] (Table 10 example code).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the PRQ of Cavus to store offset addresses (i.e., func(i)*size(A)) which can be added to the base address of A to generate addresses for a hash data structure A[func(i)]. One of ordinary skill in the art would have been motivated to make this modification reduce the size requirement for the PRQ since storing offsets would require less space in the PRQ than storing the full address (base plus offset).
Regarding claim 4, Cavus in view of Meier and Ye teaches:
4. The method of claim 1, wherein prefetching data to a cache line using an address of the one or more addresses comprises instructing a prefetcher interface to use a prefetcher to prefetch data to a cache line using an address of the one or more addresses (the PCU (i.e., a prefetcher interface) calculates a prefetch address to send to/use the PRQ, i.e., a prefetcher, to prefetch data into the cache, see Cavus pg. 11 sec. C par. 1 and Fig. 4).
Regarding claim 7, Cavus in view of Meier teaches:
7. The method of claim 1, further comprising releasing the data structure (Cavus Fig. 4: the PRQ releases (i.e., sends, allows to be available/move freely) the prefetch addresses (i.e., releases the data structure) to the L1 cache).
Regarding claim 8, Cavus teaches:
8. A prefetcher management unit (PMU) (Fig. 4), comprising:
an interface to interact with programs via an application programming interface (API) (the interface between the ATU and the CPU, see Fig. 4, is an interface that interacts with programs via an API since it receives indirect access information from a software component/program, see pg. 4 sec. 2.1 and 2.1.1);
one or more first-in-first-out (FIFO) data structures configured to store a plurality of addresses, each of the plurality of addresses representing a memory location of data stored in an array segment (Fig. 4: the entries of the prefetch request queue PRQ are a FIFO data structure (since a queue is a first-in-first-out data structure) that stores calculated addresses from the prefetch calculation buffer PCB, see also sec C par. 1 on pg. 11: describing the PCB calculating the prefetch addresses, where the addresses represent a memory location of data stored in an array segment, see hashed array examples shown in Table 1 on pg. 5 and Table 10 pg. 25);
a prefetcher interface configured to use an address in the one or more FIFO data structures to instruct a prefetcher to prefetch data into a cache (the PCU (i.e., a prefetcher interface) calculates a prefetch address to instruct the PRQ, i.e., a prefetcher, to prefetch data into the cache, see Cavus pg. 11 sec. C par. 1 and Fig. 4); and
to load data from the cache (pg. 12 par. 1 teaches prefetching values into the cache ahead of time so that the values are found in the cache when needed and the description of accuracy and timeliness in sec. 4.4 par. 1 on pg. 21 indicates that the prefetched data will be loaded from the cache line when there is a demand access/cache hit);
wherein the PMU is configured to identify portions of data in the array segment where hash-like accesses are performed at a predetermined frequency (sec. 2.1 describes that a compiler may identify a loop and pass extracted indirect-access information to the hardware component through ATI instructions and Table 1 on pg. 5 shows ATI instructions for a hashed index, this indicates that the ATP hardware/PMU (see Fig. 4 and corresponding description in sec. 2.2) identifies portions of an array that are accessed using hashed indices (i.e., hash-like accesses) in a loop (i.e., at the predetermined frequency of the loop) based on the ATI instructions that initialize the ATP for hashed indices), and prefetch data to be used in a loop prior to use of the data in the loop (Cavus pg. 7 par. 2-3 teaches that data is prefetched for iterations into the future, which indicates that the data is prefetched prior to its use in a loop), dynamically compute prefetch addresses in advance (pg. 11 sec. C par. 2: the ATP (i.e., a PMU) computes the prefetch addresses, see also Fig. 4 showing a Prefetch Calculation Unit (PCU) of the ATP calculating prefetch addresses (described in sec. C) in advance of the PRQ), and store the dynamically computed prefetch addresses in the FIFO data structure (Fig. 4: the arrow from the PCU to the PRQ indicates that the prefetch addresses calculated by the PCU (described in sec. C) are stored in the PRQ)
Cavus does not explicitly teach:
a load-store unit interface configured to use the address in the one or more data structures to instruct a load-store unit to load data from the cache;
However, Meier teaches a load-store unit interface configured to use an address to instruct a load-store unit to load data ([0029]: the input to the LSU is an interface which receives and uses a load op to instruct the LSU transfer of data from memory to a register using an address provided by the load).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the core of Cavus (see Fig. 4) to include an LSU that executes load instructions that drive input to the LSU as taught by Meier such that the LSU will load the prefetched data from the cache line by executing a load instruction that specifies the address of the prefetched data in the cache. One of ordinary skill in the art would have been motivated to make this modification because use of an LSU to execute load instructions is a known technique on the known device of a processor for accessing memory and would yield the predictable result of improving performance since the LSU would free up other execution resources (that would otherwise process load/store instructions in implementations without an LSU) to process other instructions (see MPEP 2143, Example D).
Regarding claim 11, Cavus in view Meier teaches:
11. The PMU of claim 8, wherein the plurality of addresses comprise a plurality of hash access addresses (Cavus table 10 on pg. 25 shows that prefetching for hashed data structures is supported by ATP, which indicates that the calculated addresses in the PRQ may include hash access addresses, see also pg. 7 par. 5 describing calculating the prefetch address of A[func(B[i])]).
Regarding claim 12, Cavus in view of Meier teaches:
12. The PMU of claim 8,
Cavus in view of Meier does not teach:
wherein each address of the plurality of addresses comprises a hash offset address which is added to a base memory address to form a hash access address.
However, Cavus further teaches that the address of an indirect access A[B[i]] is equal to a base address plus B[i]*size(A), which is an offset address added to a base memory location (pg. 11 equation (2)), and a hash data structure A[func(i)] (Table 10 example code).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the PRQ of Cavus to store offset addresses (i.e., func(i)*size(A)) which can be added to the base address of A to generate addresses for a hash data structure A[func(i)]. One of ordinary skill in the art would have been motivated to make this modification reduce the size requirement for the PRQ since storing offsets would require less space in the PRQ than storing the full address (base plus offset).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached on (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182
/KASIM ALLI/Examiner, Art Unit 2182