The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-19 are presented for examination in this application (18/875,499) filed on July 26, 2024.
The Examiner cites particular sections in the references as applied to the claims below for the convenience of the applicant(s). Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant(s) fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
Claims 1-19 are pending for consideration.
Drawings
The drawings submitted on July 26, 2024 have been considered and accepted.
Claim Rejections - 35 U.S.C. 112
6. The following is a quotation of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), first paragraph:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode contemplated by the inventor of carrying out his invention.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 1 is rejected under 35 U.S.C. 112 (a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as claims recite “the predicted exposed latency corresponding to a stall that would be caused by waiting for the memory load request to complete had the data stored at the address in memory targeted by the memory load request not been prefetched into a cache”, specification only recites “the exposed latency 58 (i.e. actual exposed latency) is measured between when the memory load request starts causing a stall (S60a) and when the memory load request completes at S52, and the predicted exposed latency 68 is measured between S60a (i.e. the point that a stall would be caused had it not been prefetched, which is also the point where the stall occurs even with prefetching) and when the memory load request would complete had it not been prefetched at S62. Thus, it will be appreciated that the actual exposed latency 58 and the predicted exposed latency 68 can in some examples overlap, in that the predicted exposed latency 68 may include an inherent exposed latency that is incurred even in the case of prefetching” (Page 14); where it is unclear how the predicted exposed latency can be determined as the language recites “would be caused” and “had the data…not been prefetched” so it is unclear if the data not been prefetched, which results to a delay that is actually been calculated or it is just a hypothetical scenario without actual calculation.
Claim 2 is rejected under 35 U.S.C. 112 (a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as claims recite “determine the predicted exposed latency that would be caused had the data stored at the address in memory targeted by the memory load request had not been prefetched into the cache even when the data stored at the address in memory targeted by the memory load request has actually been prefetched into the cache”, specification only recites “the exposed latency 58 (i.e. actual exposed latency) is measured between when the memory load request starts causing a stall (S60a) and when the memory load request completes at S52, and the predicted exposed latency 68 is measured between S60a (i.e. the point that a stall would be caused had it not been prefetched, which is also the point where the stall occurs even with prefetching) and when the memory load request would complete had it not been prefetched at S62. Thus, it will be appreciated that the actual exposed latency 58 and the predicted exposed latency 68 can in some examples overlap, in that the predicted exposed latency 68 may include an inherent exposed latency that is incurred even in the case of prefetching” (Page 14); where it is unclear how the predicted exposed latency can be determined as the language recites “would be caused”, “had the data…not been prefetched” and “even when the data … has actually been prefetched into the cache” so it is unclear if the data has been determined that it has not been prefetched first but actually been prefetched as claimed, which results to a delay that is actually been calculated next or it is just a hypothetical scenario without actual determination or calculation.
Claim 3 is rejected under 35 U.S.C. 112 (a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as claims recite “when the memory load request would start causing the stall had the data stored at the address in memory targeted by the memory load request not been prefetched into the cache”, specification only recites “the exposed latency 58 (i.e. actual exposed latency) is measured between when the memory load request starts causing a stall (S60a) and when the memory load request completes at S52, and the predicted exposed latency 68 is measured between S60a (i.e. the point that a stall would be caused had it not been prefetched, which is also the point where the stall occurs even with prefetching) and when the memory load request would complete had it not been prefetched at S62. Thus, it will be appreciated that the actual exposed latency 58 and the predicted exposed latency 68 can in some examples overlap, in that the predicted exposed latency 68 may include an inherent exposed latency that is incurred even in the case of prefetching” (Page 14); where it is unclear how the stall can be started and if the load is actually started to cause the stall as the language recites “when the memory load request would start causing the stall had the data stored at the address in memory targeted by the memory load request not been prefetched into the cache”; so it is unclear if the load request actually starts or “would start” as claimed, which results to a delay assuming the data not being prefetched, which is unclear how it has been determined or it is just a hypothetical scenario without actual determination or calculation.
Claim 4 is rejected under 35 U.S.C. 112 (a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as claims recite “when the memory load request would complete had the data stored at the address in memory targeted by the memory load request not been prefetched into the cache”, specification only recites “the exposed latency 58 (i.e. actual exposed latency) is measured between when the memory load request starts causing a stall (S60a) and when the memory load request completes at S52, and the predicted exposed latency 68 is measured between S60a (i.e. the point that a stall would be caused had it not been prefetched, which is also the point where the stall occurs even with prefetching) and when the memory load request would complete had it not been prefetched at S62. Thus, it will be appreciated that the actual exposed latency 58 and the predicted exposed latency 68 can in some examples overlap, in that the predicted exposed latency 68 may include an inherent exposed latency that is incurred even in the case of prefetching” (Page 14); where it is unclear how the stall can be started and if the load is actually started to cause the stall as the language recites “when the memory load request would start causing the stall had the data stored at the address in memory targeted by the memory load request not been prefetched into the cache”; so it is unclear if the load request actually starts or “would start” as claimed, which results to a delay assuming the data not being prefetched, which is unclear how it has been determined or it is just a hypothetical scenario without actual determination or calculation.
Claim 1 is rejected under 35 U.S.C. 112 (b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as claims recite “predicted exposed latency associated with a memory load request”, claims are rejected under 35 U.S.C 112(b) as it is unclear how the latency is associated with the load request as such association was not clearly described in the claims.
Claim 2 is rejected under 35 U.S.C. 112 (b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as claims recite “determine the predicted exposed latency that would be caused had the data stored at the address in memory targeted by the memory load request had not been prefetched into the cache even when the data stored at the address in memory targeted by the memory load request has actually been prefetched into the cache”, where it is unclear if the request has been prefetched or not.
Claim 7 is rejected under 35 U.S.C. 112 (b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as claims recite “the average total latency associated with a given data source”, claim is rejected under 35 U.S.C 112(b) as it is unclear how the latency is associated with the data source as such association was not clearly described in the claims.
Claim 16 is rejected under 35 U.S.C. 112 (b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as claims recite “the average total latency associated with a given data source”, claim is rejected under 35 U.S.C 112(b) as it is unclear how the latency is associated with the data source as such association was not clearly described in the claims.
All dependent claims are rejected as having the same deficiencies as the claims they depend from.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 11-12 and 17-19 are rejected under 35 U.S.C. 103(a) as being unpatentable by Witt et al. (US PGPUB 2023/0342148) (hereinafter ‘Witt’), and further in view of Bonanno et al. (US PGPUB 2013/0339692 hereinafter referred to as Bonanno).
As per independent claim 1, Witt discloses an apparatus comprising: latency determination circuitry configured to determine a predicted exposed latency associated with a memory load request targeting an address in memory [(Paragraphs 0007-0008, 0020, 0025-0033 and 0047; FIG. 1-3 and related text) wherein Witt teaches where FIG. 2 illustrates details of the non-cacheable predictor 35 for predicting the non-cacheable latency time for load instructions. The instruction addresses of the fetched instructions provided by the instruction fetch unit 20 are used by the non-cacheable latency predictor 35 to provide predicted latency times for non-cacheable load instructions. In one embodiment, the non-cacheable latency times of the predicted load instructions are available at the same time as the normal decoding of the latency time by the instruction decode unit 30. The predicted non-cacheable latency times overrides the decoded latency time for load instructions. Without the non-cacheable predictor 35, the non-cacheable load instructions assume the decoded load latency time which is the latency of a data cache hit to correspond to the claimed limitation]; and prefetch control circuitry configured to control issuing of prefetch requests to prefetch data from a memory system based on the predicted exposed latency [(Paragraphs 0007-0008, 0020, 0025-0033 and 0047; FIG. 1-3 and related text) wherein Witt teaches where All available resources for the required times are read from the time-resource matrix 50 and sent to the instruction issue unit 55 for a decision of when to issue an instruction to the execution queue 70. If the resources are available at the required times, then the instruction can be scheduled and sent to the execution queue 70. The issued instruction updates the register scoreboard 40 with the write time and updates the time-resource matrix 50 to reduce the available resource values. All resources must be available at the required time counts for the instruction to be dispatched to the execution queue 70. If all resources are not available, then the required time counts are incremented by one, and the time-resource matrix is checked as soon as the same cycle or next cycle. The particular number of read buses 66, write buses 68, and functional units 75 in FIG. 1 is preferably chosen to minimize stalling of instruction in the instruction issue unit 55 to correspond to the claimed limitation].
Witt does not appear to explicitly disclose the predicted exposed latency corresponding to a stall that would be caused by waiting for the memory load request to complete had the data stored at the address in memory targeted by the memory load request not been prefetched into a cache.
However, Bonanno discloses the predicted exposed latency corresponding to a stall that would be caused by waiting for the memory load request to complete had the data stored at the address in memory targeted by the memory load request not been prefetched into a cache [(Paragraphs 0040-0044) where Bonanno teaches mitigating instruction prediction latency, the method includes receiving an instruction address in an instruction cache for fetching instructions in a processor pipeline. The method also includes receiving the instruction address in a prediction presence predictor coupled to the processor pipeline. The prediction presence predictor includes a single or plurality of presence predictors, each coupled with a dynamic filter, configured to each receive the instruction address in parallel and to generate an unfiltered indication of an associated BTB prediction. Each dynamic filter is configured to block the unfiltered indications based on the performance of the presence predictor it is coupled to. The prediction presence predictor further includes a stall determination logic coupled to the plurality of dynamic filters. The stall determination logic is configured to generate a combined indication of the associated prediction based upon one or more filtered indications received from the plurality of dynamic filters. Based on receipt of the combined indication from the prediction presence predictor, the method includes holding instructions extracted from the instructions being fetched when they are determined to be BTB predictable by opcode, but such a prediction is not yet available. Based on either the receipt of a branch prediction from a branch target buffer or reaching a pre determined programmable timeout period, the method includes releasing said held instructions to the pipeline for execution to correspond to the claimed limitation].
Witt and Bonanno are analogous art because they are from the same field of endeavor of data storage management.
Before the effective filing date of the claimed inventions, it would have been obvious to one of ordinary skill in the art, having the teachings of Witt and Bonanno before him or her, to modify the method of Witt to include the stall determination logic of Bonanno because it will enhance data access.
The motivation for doing so would be [“Instructions may also be released, or prevented from being stalled in the first place, when it is determined that the branch prediction search results are beyond the point of the instruction(s) being examined, which effectively means a prediction for the instruction will never be, or already is, available” (Paragraph 0006 by Bonanno)].
Therefore, it would have been obvious to combine Witt and Bonanno to obtain the invention as specified in the instant claim.
As per claim 2, Bonanno discloses in which the latency determination circuity is configured to determine the predicted exposed latency that would be caused had the data stored at the address in memory targeted by the memory load request had not been prefetched into the cache even when the data stored at the address in memory targeted by the memory load request has actually been prefetched into the cache [(Paragraphs 0040-0044) where Bonanno teaches mitigating instruction prediction latency, the method includes receiving an instruction address in an instruction cache for fetching instructions in a processor pipeline. The method also includes receiving the instruction address in a prediction presence predictor coupled to the processor pipeline. The prediction presence predictor includes a single or plurality of presence predictors, each coupled with a dynamic filter, configured to each receive the instruction address in parallel and to generate an unfiltered indication of an associated BTB prediction. Each dynamic filter is configured to block the unfiltered indications based on the performance of the presence predictor it is coupled to. The prediction presence predictor further includes a stall determination logic coupled to the plurality of dynamic filters. The stall determination logic is configured to generate a combined indication of the associated prediction based upon one or more filtered indications received from the plurality of dynamic filters. Based on receipt of the combined indication from the prediction presence predictor, the method includes holding instructions extracted from the instructions being fetched when they are determined to be BTB predictable by opcode, but such a prediction is not yet available. Based on either the receipt of a branch prediction from a branch target buffer or reaching a pre determined programmable timeout period, the method includes releasing said held instructions to the pipeline for execution to correspond to the claimed limitation].
As per claim 3, Bonanno discloses in which the latency determination circuitry is configured to determine the predicted exposed latency based on determining a hidden latency, the hidden latency corresponding to an amount of time between when the memory load request starts executing and when the memory load request would start causing the stall had the data stored at the address in memory targeted by the memory load request not been prefetched into the cache [(Paragraphs 0040-0044) where Bonanno teaches mitigating instruction prediction latency, the method includes receiving an instruction address in an instruction cache for fetching instructions in a processor pipeline. The method also includes receiving the instruction address in a prediction presence predictor coupled to the processor pipeline. The prediction presence predictor includes a single or plurality of presence predictors, each coupled with a dynamic filter, configured to each receive the instruction address in parallel and to generate an unfiltered indication of an associated BTB prediction. Each dynamic filter is configured to block the unfiltered indications based on the performance of the presence predictor it is coupled to. The prediction presence predictor further includes a stall determination logic coupled to the plurality of dynamic filters. The stall determination logic is configured to generate a combined indication of the associated prediction based upon one or more filtered indications received from the plurality of dynamic filters. Based on receipt of the combined indication from the prediction presence predictor, the method includes holding instructions extracted from the instructions being fetched when they are determined to be BTB predictable by opcode, but such a prediction is not yet available. Based on either the receipt of a branch prediction from a branch target buffer or reaching a pre determined programmable timeout period, the method includes releasing said held instructions to the pipeline for execution to correspond to the claimed limitation].
As per claim 4, Bonanno discloses in which the latency determination circuitry is configured to determine the predicted exposed latency based on predicting a total latency, the total latency being indicative of an amount of time between when the memory load request starts executing and when the memory load request would complete had the data stored at the address in memory targeted by the memory load request not been prefetched into the cache [(Paragraphs 0040-0044) where Bonanno teaches mitigating instruction prediction latency, the method includes receiving an instruction address in an instruction cache for fetching instructions in a processor pipeline. The method also includes receiving the instruction address in a prediction presence predictor coupled to the processor pipeline. The prediction presence predictor includes a single or plurality of presence predictors, each coupled with a dynamic filter, configured to each receive the instruction address in parallel and to generate an unfiltered indication of an associated BTB prediction. Each dynamic filter is configured to block the unfiltered indications based on the performance of the presence predictor it is coupled to. The prediction presence predictor further includes a stall determination logic coupled to the plurality of dynamic filters. The stall determination logic is configured to generate a combined indication of the associated prediction based upon one or more filtered indications received from the plurality of dynamic filters. Based on receipt of the combined indication from the prediction presence predictor, the method includes holding instructions extracted from the instructions being fetched when they are determined to be BTB predictable by opcode, but such a prediction is not yet available. Based on either the receipt of a branch prediction from a branch target buffer or reaching a pre determined programmable timeout period, the method includes releasing said held instructions to the pipeline for execution to correspond to the claimed limitation].
As per claim 5, Bonanno discloses in which the latency determination circuitry is configured to predict the total latency based on determining a data source of a prefetched cache line containing the data stored at the address in memory targeted by the memory load request [(Paragraphs 0040-0044) where Bonanno teaches mitigating instruction prediction latency, the method includes receiving an instruction address in an instruction cache for fetching instructions in a processor pipeline. The method also includes receiving the instruction address in a prediction presence predictor coupled to the processor pipeline. The prediction presence predictor includes a single or plurality of presence predictors, each coupled with a dynamic filter, configured to each receive the instruction address in parallel and to generate an unfiltered indication of an associated BTB prediction. Each dynamic filter is configured to block the unfiltered indications based on the performance of the presence predictor it is coupled to. The prediction presence predictor further includes a stall determination logic coupled to the plurality of dynamic filters. The stall determination logic is configured to generate a combined indication of the associated prediction based upon one or more filtered indications received from the plurality of dynamic filters. Based on receipt of the combined indication from the prediction presence predictor, the method includes holding instructions extracted from the instructions being fetched when they are determined to be BTB predictable by opcode, but such a prediction is not yet available. Based on either the receipt of a branch prediction from a branch target buffer or reaching a pre determined programmable timeout period, the method includes releasing said held instructions to the pipeline for execution to correspond to the claimed limitation].
As per claim 11, Witt discloses in which the cache that the data stored at the address in memory targeted by the memory load request has been prefetched into is a lowest-level cache [(Paragraph 0006) where Witt teaches non-cacheable data is stored in external memory which includes input/output devices, close-couple memory, or specialized memories. The cacheable data can be stored in data caches which include level-1 (L2) data cache and possibly other levels of caches (level-2 (L2), etc.) external to the main microprocessor chip. The determination of the load data types is part of the memory address calculation and accessing of the memory management unit. The memory management unit may consist of the physical memory attribute (PMA) and physical memory protection (PMP) logic which specifies the memory address ranges for different memory types. In general, all load instructions assume the latency time of an L1 data cache hit which is correct about 80-90% of the time to correspond to the claimed limitation].
As per claim 12, Witt discloses in which the cache that the data stored at the address in memory targeted by the memory load request has been prefetched into is a lowest-level cache [(Paragraph 0006) where Witt teaches non-cacheable data is stored in external memory which includes input/output devices, close-couple memory, or specialized memories. The cacheable data can be stored in data caches which include level-1 (L2) data cache and possibly other levels of caches (level-2 (L2), etc.) external to the main microprocessor chip. The determination of the load data types is part of the memory address calculation and accessing of the memory management unit. The memory management unit may consist of the physical memory attribute (PMA) and physical memory protection (PMP) logic which specifies the memory address ranges for different memory types. In general, all load instructions assume the latency time of an L1 data cache hit which is correct about 80-90% of the time to correspond to the claimed limitation].
As per claim 17, Witt discloses implemented in at least one packaged chip; at least one system component; and a board, wherein the at least one packaged chip and the at least one system component are assembled on the board [(Paragraphs 0020-0022) where FIG. 1 is a block diagram of a microprocessor based data processing system. The exemplary system includes a microprocessor 10 having a clock unit 15, an instruction fetch unit 20, an instruction cache 24, a branch prediction unit 22, an instruction decode unit 30, a non-cacheable predictor 35, a register scoreboard 40, a time-resource matrix 50, an instruction issue unit 55, a register file 60, a read control unit 62, a write control unit 64, a plurality of execution queues 70, a plurality of functional units 75, a load-store unit 80, and a data cache 85. The microprocessor 10 includes a plurality of read buses 66 from the register files to the functional units 75 and load-store unit 80. The system also includes a plurality of write buses 68 to write result data from the functional unit 75, the load-store unit 80, and the data cache 85 to the register file 60. The microprocessor 10 is a synchronous microprocessor where the clock unit generates a clock signal (“elk”) which couples to all the units in the microprocessor 10. The clock unit 15 provides a continuously toggling logic signal 17 which toggles between 0 and 1 repeatedly at a clock frequency. to correspond to the claimed limitation].
As per claim 18, Witt discloses wherein the system is assembled on a further board with at least one other product component [(Paragraphs 0020-0022) where FIG. 1 is a block diagram of a microprocessor based data processing system. The exemplary system includes a microprocessor 10 having a clock unit 15, an instruction fetch unit 20, an instruction cache 24, a branch prediction unit 22, an instruction decode unit 30, a non-cacheable predictor 35, a register scoreboard 40, a time-resource matrix 50, an instruction issue unit 55, a register file 60, a read control unit 62, a write control unit 64, a plurality of execution queues 70, a plurality of functional units 75, a load-store unit 80, and a data cache 85. The microprocessor 10 includes a plurality of read buses 66 from the register files to the functional units 75 and load-store unit 80. The system also includes a plurality of write buses 68 to write result data from the functional unit 75, the load-store unit 80, and the data cache 85 to the register file 60. The microprocessor 10 is a synchronous microprocessor where the clock unit generates a clock signal (“elk”) which couples to all the units in the microprocessor 10. The clock unit 15 provides a continuously toggling logic signal 17 which toggles between 0 and 1 repeatedly at a clock frequency. to correspond to the claimed limitation].
As per claim 19, Witt discloses a non-transitory computer-readable medium storing computer-readable code for fabrication of the apparatus of claim 1 [(Paragraphs 0020-0022 and 0038-0039) where implementations of software executed on a general-purpose, or special purpose, computing system may take the form of a computer-implemented method for implementing a microprocessor, and also as a computer program product for implementing a microprocessor, where the computer program product is stored on a non-transitory computer readable storage medium and includes instructions for causing the computer system to execute a method. The aforementioned program modules and/or code segments may be executed on suitable computing system to perform the functions disclosed herein. Such a computing system will typically include one or more processing units, memory and non-transitory storage to execute computer-executable instructions to correspond to the claimed limitation].
Claims 13-16 are rejected under 35 U.S.C. 103(a) as being unpatentable by Witt/Bonanno, as applied to claims 1, and further in view of Sun et al. (US PGPUB 2023/0297507 hereinafter referred to as Sun).
As per dependent claim 13, Witt/Bonanno discloses the apparatus of claim 1.
Witt/Bonanno does not appear to explicitly disclose in which the prefetch control circuitry is configured to determine whether to issue or suppress issuing a prefetch request to prefetch data from the memory system based on determining whether the predicted exposed latency satisfies a condition.
However, Sun discloses in which the prefetch control circuitry is configured to determine whether to issue or suppress issuing a prefetch request to prefetch data from the memory system based on determining whether the predicted exposed latency satisfies a condition [(Paragraphs 0009-0012) where the adaptive prefetcher for a shared system cache as described herein prefetches a selected subsequent cache line based on a latency comparison between a loop latency of the adaptive prefetcher and a stream latency of an identified requestor. The loop latency of a prefetch controller of the adaptive prefetcher includes a decision delay of the prefetch controller plus a latency of switch fabric coupled between multiple requestors and the shared system cache. Each requestor has a stream latency which is a delay between successive operations of that requestor. In one embodiment, the adaptive prefetcher includes a prefetch controller that submits an adaptive request to request the next cache line after skipping SK cache lines for a requestor when the loop latency is greater than SK multiplied by the stream latency and less than or equal to SK+1 multiplied by the stream latency of the requestor, in which SK is an integer of at least zero. The adaptive prefetcher may include a latency memory that stores a stream latency for each of the requestors. The loop and stream latencies may be fixed or may be programmable. The adaptive prefetcher may include, for example, a requestor monitor that updates the stream latencies based on actual measured stream latencies to correspond to the claimed limitation].
Witt/Bonnano and Sun are analogous art because they are from the same field of endeavor of data storage management.
Before the effective filing date of the claimed inventions, it would have been obvious to one of ordinary skill in the art, having the teachings of Witt/Bonnano and Sun before him or her, to modify the method of Witt/Bonnano to include the adaptive prefetching of Sun because it will enhance data access.
The motivation for doing so would be [“optimize operation of the SSLC 110 using the ASCP 112 to reduce the number of misses as much as possible” (Paragraph 0021 by Sun)].
Therefore, it would have been obvious to combine Witt/Bonnano and Sun to obtain the invention as specified in the instant claim.
As per claim 14, Sun discloses in which the prefetch control circuitry is configured to issue a prefetch request targeting the address in memory targeted by the memory load request based on determining that the predicted exposed latency satisfies a condition [(Paragraphs 0009-0012) where the adaptive prefetcher for a shared system cache as described herein prefetches a selected subsequent cache line based on a latency comparison between a loop latency of the adaptive prefetcher and a stream latency of an identified requestor. The loop latency of a prefetch controller of the adaptive prefetcher includes a decision delay of the prefetch controller plus a latency of switch fabric coupled between multiple requestors and the shared system cache. Each requestor has a stream latency which is a delay between successive operations of that requestor. In one embodiment, the adaptive prefetcher includes a prefetch controller that submits an adaptive request to request the next cache line after skipping SK cache lines for a requestor when the loop latency is greater than SK multiplied by the stream latency and less than or equal to SK+1 multiplied by the stream latency of the requestor, in which SK is an integer of at least zero. The adaptive prefetcher may include a latency memory that stores a stream latency for each of the requestors. The loop and stream latencies may be fixed or may be programmable. The adaptive prefetcher may include, for example, a requestor monitor that updates the stream latencies based on actual measured stream latencies to correspond to the claimed limitation].
As per claim 15, Sun discloses in which the prefetch control circuitry is configured to suppress issuing of a prefetch request targeting the address in memory targeted by the memory load request based on determining that the predicted exposed latency does not satisfy a condition [(Paragraphs 0009-0012) where the adaptive prefetcher for a shared system cache as described herein prefetches a selected subsequent cache line based on a latency comparison between a loop latency of the adaptive prefetcher and a stream latency of an identified requestor. The loop latency of a prefetch controller of the adaptive prefetcher includes a decision delay of the prefetch controller plus a latency of switch fabric coupled between multiple requestors and the shared system cache. Each requestor has a stream latency which is a delay between successive operations of that requestor. In one embodiment, the adaptive prefetcher includes a prefetch controller that submits an adaptive request to request the next cache line after skipping SK cache lines for a requestor when the loop latency is greater than SK multiplied by the stream latency and less than or equal to SK+1 multiplied by the stream latency of the requestor, in which SK is an integer of at least zero. The adaptive prefetcher may include a latency memory that stores a stream latency for each of the requestors. The loop and stream latencies may be fixed or may be programmable. The adaptive prefetcher may include, for example, a requestor monitor that updates the stream latencies based on actual measured stream latencies to correspond to the claimed limitation].
As per claim 16, Sun discloses in which the condition comprises one or more of: a predetermined predicted exposed latency threshold, and a ranking condition associated with a ranking of a plurality of predicted exposed latencies [(Paragraphs 0009-0012 and 0021) where the adaptive prefetcher for a shared system cache as described herein prefetches a selected subsequent cache line based on a latency comparison between a loop latency of the adaptive prefetcher and a stream latency of an identified requestor. The loop latency of a prefetch controller of the adaptive prefetcher includes a decision delay of the prefetch controller plus a latency of switch fabric coupled between multiple requestors and the shared system cache. Each requestor has a stream latency which is a delay between successive operations of that requestor. In one embodiment, the adaptive prefetcher includes a prefetch controller that submits an adaptive request to request the next cache line after skipping SK cache lines for a requestor when the loop latency is greater than SK multiplied by the stream latency and less than or equal to SK+1 multiplied by the stream latency of the requestor, in which SK is an integer of at least zero. The adaptive prefetcher may include a latency memory that stores a stream latency for each of the requestors. The loop and stream latencies may be fixed or may be programmable. The adaptive prefetcher may include, for example, a requestor monitor that updates the stream latencies based on actual measured stream latencies to correspond to the claimed limitation].
a(2) CLAIMS ALLOWED IN THE APPLICATION
Per the instant office action, claims 6-10, but would be allowable if rewritten in an independent form and overcome 112 rejections.
The reasons for allowance of claim 6 is that the prior art of record, neither anticipates, nor renders obvious the recited combination as a whole; including the limitations of “in which the latency determination circuitry is configured to predict the total latency based on determining, for a given data source, an average total latency of other memory load requests that target data in the given data source, the average total latency being indicative of an average amount of time between when the other memory load requests start executing and when the other memory load requests complete”.
Pertinent Prior art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Rygh et al., US PGPUB 2015/0091920– teaches MEMORY LATENCY TOLERANCE IN BLOCK PROCESSING PIPELINES.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jared Ian Rutz whose telephone number is (571)272-5535. The examiner can normally be reached on Monday-Friday, 8:00 AM to 4:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jared Rutz can be reached on 571-272-5535. The fax phone number for the organization where this application or proceeding is assigned is 571-270-2857.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MOHAMED M GEBRIL/Primary Examiner, Art Unit 2135