The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: VECTOR PROCESSING WITH LEARNED NON-CONTIGUOUS MEMORY ACCESS LINEARIZATION.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 8 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Specifically, independent claim 8 is rejected because it recite the limitation " the prefetch address lookup table". There is insufficient antecedent basis for this limitation in the claim. Claims 9-14 inherit the rejection of claim 8.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 5-6, 15-16, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ansari (US Patent # 6813701) in view of 2013-NPL “Linearizing Irregular Memory Accesses for Improved Correlated Prefetching” hereinafter “Jain”. With respect to independent claims 1, 15 (exemplary claim 1) Ansari/Jain discloses: A circuit [Ansari fig 1], comprising:
a first intermediate memory communicatively coupled with a vector processor and a RAM, wherein the vector processor is communicatively coupled with the RAM [VBP is buffered memory between main memory and vector processor, vector segments can be transferred between the processor and the memory as separate streams using a burst transfer technique… A vector buffer is a fixed-sized partition in the vector buffer pool (VBP) which is normally allocated to a single process and is partitioned by the compiler among variable-sized streams each holding a vector segment - Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18];
an address sequence memory to store non-linear RAM addresses corresponding to linear locations in the first intermediate memory [Ansari does not explicitly teach an address sequence memory to store non-linear RAM addresses corresponding to linear locations in the first intermediate memory. Nevertheless in the same field of endeavor Jain teaches: The main idea is to introduce an extra level of indirection to create a new structural address space in which correlated physical addresses are assigned consecutive structural addresses. The key point is that in this structural address space, streams of correlated memory addresses are both temporally ordered and spatially ordered… Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB – Jain p.248, left col, fig 2, 5-7; The ISB uses two on-chip caches to maintain the mapping between physical and structural addresses. The Physical to-Structural AMC (PS-AMC) stores the mapping from the physical address space to the structural address space; it is indexed by physical addresses. The Structural-to-Physical AMC(SP-AMC) stores the inverse mapping as the PS-AMC and is indexed by structural addresses. While the SP-AMC is not strictly necessary, it enables efficient temporal stream prediction because each cache line in the SP-AMC can yield in a single lookup 16 prefetch candidates from the current temporal stream - Jain p.251, left col - Address Mapping Caches (AMCs). The combination of Ansari/Jain teaches a memory that stores non-contiguous RAM addresses and associates them with sequential/contiguous/structural positions that map onto linear locations in an intermediate cache/buffer. In other words, PS-AMC/SP-AMC caches store non-linear addresses and assigns them consecutive structural addresses so that a sequential walk of structural addresses follows the non0linear addresses in order; the structural addresses map onto linear positions in buffer VBP. Those teachings disclose a memory storing sequence of non-linear RAM addresses associated with linear locations in intermediate memory VBP];
a data sequencer to read a first frame of data from the RAM to the first intermediate memory based on addresses stored in the address sequence memory [VTU and associated control logic function as data sequencer operable to read/access data from the RAM to first intermediate memory (VBP) based on addresses stored in the address sequence memory (PS-AMC/SP-AMC) – Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18; Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15; compiler schedules transfers of vector streams required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred - Ansari abstract] [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph] [The combination of these teachings yields a sequencer that reads a first frame (vector segment/stream) from RAM into intermediate memory (VBP) according to a stored sequence of addresses in an address sequence memory (PS-AMC/SP-AMC)]; and
the first intermediate memory to provide a linearized frame of data to the vector processor to execute a vector instruction [A vector buffer is a fixed-sized partition in the vector buffer pool (VBP) which is normally allocated to a single process and is partitioned by the compiler among variable-sized streams each holding a vector segment… Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers) - Ansari col 3 lines 5-15] [we see in Figure 2 that a sequential traversal of the structural address space visits the elements of the irregular temporal stream—A, B, C, D and E—in temporal order. Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB - Jain p.248, left col, fig 2] [The combination of these teachings yields a vector buffer VBP holding vector segment/stream that is transferred into registers using special vector data instructions, the processor uses those registers as operands for vector computations on the buffered frame of data. In view of Jain, even if physical addresses are non-linear, they are mapped into sequential structural order or linearized sequence so that a linearized frame is provided to vector processor to execute vector instruction].
Ansari does not explicitly disclose an address sequence memory to store non-linear RAM addresses corresponding to linear locations in the first intermediate memory, a data sequencer or a linearized frame. Nevertheless in the same field of endeavor, Jain teaches: The main idea is to introduce an extra level of indirection to create a new structural address space in which correlated physical addresses are assigned consecutive structural addresses. The key point is that in this structural address space, streams of correlated memory addresses are both temporally ordered and spatially ordered… Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB – Jain p.248, left col, fig 2, 5-7; The ISB uses two on-chip caches to maintain the mapping between physical and structural addresses. The Physical to-Structural AMC (PS-AMC) stores the mapping from the physical address space to the structural address space; it is indexed by physical addresses. The Structural-to-Physical AMC(SP-AMC) stores the inverse mapping as the PS-AMC and is indexed by structural addresses. While the SP-AMC is not strictly necessary, it enables efficient temporal stream prediction because each cache line in the SP-AMC can yield in a single lookup 16 prefetch candidates from the current temporal stream - Jain p.251, left col - Address Mapping Caches (AMCs). The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph. We see in Figure 2 that a sequential traversal of the structural address space visits the elements of the irregular temporal stream—A, B, C, D and E—in temporal order. Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB - Jain p.248, left col, fig 2
Therefore, Ansari/Jain teaches all limitations of the instant claim(s).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement a linearized memory, sequencer and linearized frame in the invention of Ansari as taught by Jain because it would be advantageous for improving prefetching for irregular/non-linear access patterns (Jain p.247 & 248, left col, also p251 left col).
With respect to independent claim 15, since the instant claim is substantially similar in scope relative to claim 1, it is rejected according to substantially the same rationale as applied to claim 1, with minor differences considered as follows:
reading a sequence of non-contiguous addresses from an address sequence memory [Irregular Stream Buffer (ISB), a prefetcher that targets irregular (non-contiguous) sequences of temporally correlated memory references (addresses). The key idea is to use an extra level of indirection to translate arbitrary pairs of correlated physical addresses into consecutive addresses in a new structural address space, which is visible only to the ISB – Jain p247 abstract; The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch – Jain p251-252];
prefetching a first frame of data from a RAM using the sequence of non-contiguous addresses [prefetch & prediction functionality discussed on p 251 of Jain; The Stream Predictor predicts the next consecutive structural addresses to prefetch – Jain p.252];
storing that first frame of data in a first intermediate memory [VBP is intermediate memory – Ansari fig 1-2] as a first linearized frame of data [a new structural address space in which correlated physical addresses are assigned consecutive structural addresses – Jain p. 248 fig 2 & left col];
receiving a vector load instruction to load a portion of the first frame of data [Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15];
loading the portion of the first frame of data into the vector processor from the first intermediate memory; and executing the vector instruction with the received portion of the first frame of data as an operand [A vector buffer is a fixed-sized partition in the vector buffer pool (VBP) which is normally allocated to a single process and is partitioned by the compiler among variable-sized streams each holding a vector segment… Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers) - Ansari col 3 lines 5-15] [we see in Figure 2 that a sequential traversal of the structural address space visits the elements of the irregular temporal stream—A, B, C, D and E—in temporal order. Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB - Jain p.248, left col, fig 2] [The combination of these teachings yields a vector buffer VBP holding vector segment/stream that is transferred into registers using special vector data instructions, the processor uses those registers as operands for vector computations on the buffered frame of data. In view of Jain, even if physical addresses are non-linear, they are mapped into sequential structural order or linearized sequence so that a linearized frame is provided to vector processor to execute vector instruction].
With respect to dependent claims 2 Ansari/Jain discloses a second intermediate memory communicatively coupled to the vector processor and the RAM; wherein the data sequencer is to read a second frame of data from the RAM to the second intermediate memory based on addresses stored in the address sequence memory [multiple vector buffers in pool - Ansari fig 1-2, col 3 lines 5-15] [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph] [The combination of these teachings yields a sequencer/VTU that can read/fetch a first/second frame (vector segment/stream) from RAM into intermediate memory (VBP) according to a stored sequence of addresses in an address sequence memory (PS-AMC/SP-AMC)].
With respect to dependent claims 5 Ansari/Jain discloses comprising the data sequencer to write data from the first intermediate memory to the RAM based on addresses stored in the address sequence memory [Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15; VTU and associated control logic function as data sequencer operable to read/access data from the RAM to first intermediate memory (VBP) based on addresses stored in the address sequence memory (PS-AMC/SP-AMC) – Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18 in view of Jain fig 5-7, p.251, left col].
With respect to dependent claims 6 Ansari/Jain discloses wherein the address sequence memory and the first intermediate memory are external to the RAM [Ansari fig 1-2].
With respect to dependent claims 16 Ansari/Jain discloses selecting a second intermediate memory from which to load linearized data into the vector processor [multiple vector buffers in pool - Ansari fig 1-2, col 3 lines 5-15] [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph] [The combination of these teachings yields a sequencer/VTU that can read/fetch a first/second frame (vector segment/stream) from RAM into first/second intermediate memory (VBP) according to a stored sequence of addresses in an address sequence memory (PS-AMC/SP-AMC)].
With respect to dependent claims 19 Ansari/Jain discloses storing in the address sequence memory RAM addresses corresponding to locations in a second intermediate memory; during execution of the vector instruction, storing a result in the second intermediate memory; and writing the result from the second intermediate memory to the RAM based on one of the addresses corresponding to locations in the second intermediate memory [Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15; VTU and associated control logic function as data sequencer operable to read/access data from the RAM to first intermediate memory (VBP) based on addresses stored in the address sequence memory (PS-AMC/SP-AMC) – Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18 in view of Jain fig 5-7, p.251, left col].
Claims 3-4, 8-13, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ansari/Jain further in view of Azadet (US PGPUB # 20100138465).
With respect to dependent claims 3, 17 Ansari/Jain does not explicitly disclose a scalar processor, although Ansari suggests this feature in col 1 lines 15-18. Nevertheless in the same field of endeavor, Azadet teaches a digital signal processors DSPs having vector and scalar architectures [Azadet 0023-0025], so that the combination of Ansari/Jain/Azadet discloses a scalar processor communicatively coupled to the RAM [Azadet 0023-0025]; and an address learning agent to: record a first set of memory access addresses by the scalar processor for the first frame of data [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses – Jain p. 251 left col, Training unit]; and store the first set of memory access addresses in the address sequence memory [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement the Ansari/Jain architecture in a heterogenous scalar + vector configuration as taught by Azadet so that a scalar processor could run code while a vector processor performs vector operations in parallel on scalar data units, using Jains logic to learn address sequences to aid in prefetching (Azadet 0022-0024).
With respect to dependent claim 17, since the instant claim is substantially similar in scope relative to claim 3, it is rejected according to substantially the same rationale as applied to claim 3, with minor differences considered as follows:
receiving scalar instructions equivalent in result to the vector instruction at a scalar processor [scalar architecture processes single number at a time while vector architecture processes numbers in parallel, so that scalar instructions implement same functionality as vector operations, just one at a time rather than in parallel – Azadet 0023];
recording a first set of memory access addresses consisting of each memory access address loaded from RAM while executing the scalar instructions to process the first frame of data [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses – Jain p. 251 left col, Training unit]; and store the first set of memory access addresses in the address sequence memory [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph];
linearizing the first set of memory access addresses [The main idea is to introduce an extra level of indirection to create a new structural address space in which correlated physical addresses are assigned consecutive structural addresses. The key point is that in this structural address space, streams of correlated memory addresses are both temporally ordered and spatially ordered… Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB – Jain p.248, left col, fig 2, 5-7; The ISB uses two on-chip caches to maintain the mapping between physical and structural addresses. The Physical to-Structural AMC (PS-AMC) stores the mapping from the physical address space to the structural address space; it is indexed by physical addresses. The Structural-to-Physical AMC(SP-AMC) stores the inverse mapping as the PS-AMC and is indexed by structural addresses. While the SP-AMC is not strictly necessary, it enables efficient temporal stream prediction because each cache line in the SP-AMC can yield in a single lookup 16 prefetch candidates from the current temporal stream - Jain p.251, left col - Address Mapping Caches (AMCs)]; and
storing the linearized memory access addresses in the address sequence memory [The ISB uses two on-chip caches to maintain the mapping between physical and structural addresses. The Physical to-Structural AMC (PS-AMC) stores the mapping from the physical address space to the structural address space; it is indexed by physical addresses. The Structural-to-Physical AMC(SP-AMC) stores the inverse mapping as the PS-AMC and is indexed by structural addresses. While the SP-AMC is not strictly necessary, it enables efficient temporal stream prediction because each cache line in the SP-AMC can yield in a single lookup 16 prefetch candidates from the current temporal stream - Jain p.251, left col - Address Mapping Caches (AMCs)].
With respect to dependent claims 4, 18 Ansari/Jain/Azadet discloses wherein the address learning agent is to: record a second set of memory access addresses by the scalar processor for a second frame of data [the training unit operates in a loop, continually for any number of frames of data, logging load addresses - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph]; determine the first set of memory access addresses matches the second set of memory access addresses [compare/determine newly observed addresses, disclosed as: When a correlated pair (A,B) is observed, the PS-AMC is queried to see if A and B have previously been assigned structural addresses… If A and B already have consecutive structural addresses, the ISB increments the confidence counter – Jain p. 251 both cols]; and indicate the match to the scalar processor [match indicated via combination of confidence counter & prefetch - Jain p. 251 both cols].
With respect to independent claim 8, since the instant claim is substantially similar in scope relative to claims 1, 3-4, it is rejected in view of Ansari/Jain/Azadet according to substantially the same rationale as applied to claims 1, 3-4, with minor differences considered as follows:
a first intermediate memory communicatively coupled to a vector processor [VBP is buffered memory between main memory and vector processor, vector segments can be transferred between the processor and the memory as separate streams using a burst transfer technique… A vector buffer is a fixed-sized partition in the vector buffer pool (VBP) which is normally allocated to a single process and is partitioned by the compiler among variable-sized streams each holding a vector segment - Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18], a scalar processor [Ansari/Jain does not explicitly disclose a scalar processor, although Ansari suggests this feature in col 1 lines 15-18. Nevertheless in the same field of endeavor, Azadet teaches a digital signal processors DSPs having vector and scalar architectures [Azadet 0023-0025], so that the combination of Ansari/Jain/Azadet discloses a scalar processor communicatively coupled to the RAM [Azadet 0023-0025]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement the Ansari/Jain architecture in a heterogenous scalar + vector configuration as taught by Azadet so that a scalar processor could run code while a vector processor performs vector operations in parallel on scalar data units, using Jains logic to learn address sequences to aid in prefetching (Azadet 0022-0024)], and a RAM, the RAM to store a vector program for performing calculations [Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15];
an address sequence memory to store non-linear RAM addresses corresponding to linear locations in the first intermediate memory [Ansari does not explicitly teach an address sequence memory to store non-linear RAM addresses corresponding to linear locations in the first intermediate memory. Nevertheless in the same field of endeavor Jain teaches: The main idea is to introduce an extra level of indirection to create a new structural address space in which correlated physical addresses are assigned consecutive structural addresses. The key point is that in this structural address space, streams of correlated memory addresses are both temporally ordered and spatially ordered… Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB – Jain p.248, left col, fig 2, 5-7; The ISB uses two on-chip caches to maintain the mapping between physical and structural addresses. The Physical to-Structural AMC (PS-AMC) stores the mapping from the physical address space to the structural address space; it is indexed by physical addresses. The Structural-to-Physical AMC(SP-AMC) stores the inverse mapping as the PS-AMC and is indexed by structural addresses. While the SP-AMC is not strictly necessary, it enables efficient temporal stream prediction because each cache line in the SP-AMC can yield in a single lookup 16 prefetch candidates from the current temporal stream - Jain p.251, left col - Address Mapping Caches (AMCs). The combination of Ansari/Jain teaches a memory that stores non-contiguous RAM addresses and associates them with sequential/contiguous/structural positions that map onto linear locations in an intermediate cache/buffer. In other words, PS-AMC/SP-AMC caches store non-linear addresses and assigns them consecutive structural addresses so that a sequential walk of structural addresses follows the non0linear addresses in order; the structural addresses map onto linear positions in buffer VBP. Those teachings disclose a memory storing sequence of non-linear RAM addresses associated with linear locations in intermediate memory VBP];
an address learning agent [an address learning agent to: record a first set of memory access addresses by the scalar processor for the first frame of data [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses – Jain p. 251 left col, Training unit]; and store the first set of memory access addresses in the address sequence memory [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph]] to, while the scalar processor performs the calculations of the program:
capture a first set of memory access addresses issued by the scalar processor to the RAM during processing of a first frame of data [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph];
map the set of memory access addresses to linear locations in the first intermediate memory [Irregular Stream Buffer (ISB), a prefetcher that targets irregular (non-contiguous) sequences of temporally correlated memory references (addresses). The key idea is to use an extra level of indirection to translate arbitrary pairs of correlated physical addresses into consecutive addresses in a new structural address space, which is visible only to the ISB – Jain p247 abstract; The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch – Jain p251-252]; and
store in the address sequence memory the mapping of the set of memory access addresses to the linear locations in the first intermediate memory [a new structural address space in which correlated physical addresses are assigned consecutive structural addresses – Jain p. 248 fig 2 & left col & Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph]; and
a data sequencer to read a first frame of data from the RAM and store the first frame of data to the first intermediate memory based on addresses stored in the prefetch address lookup table [VTU and associated control logic function as data sequencer operable to read/access data from the RAM to first intermediate memory (VBP) based on addresses stored in the address sequence memory (PS-AMC/SP-AMC) – Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18; Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15; compiler schedules transfers of vector streams required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred - Ansari abstract] [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph] [The combination of these teachings yields a sequencer that reads a first frame (vector segment/stream) from RAM into intermediate memory (VBP) according to a stored sequence of addresses in an address sequence memory (PS-AMC/SP-AMC)],
the first intermediate memory to thereby provide linearized data to the vector processor to execute a vector instruction [A vector buffer is a fixed-sized partition in the vector buffer pool (VBP) which is normally allocated to a single process and is partitioned by the compiler among variable-sized streams each holding a vector segment… Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers) - Ansari col 3 lines 5-15] [we see in Figure 2 that a sequential traversal of the structural address space visits the elements of the irregular temporal stream—A, B, C, D and E—in temporal order. Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB - Jain p.248, left col, fig 2] [The combination of these teachings yields a vector buffer VBP holding vector segment/stream that is transferred into registers using special vector data instructions, the processor uses those registers as operands for vector computations on the buffered frame of data. In view of Jain, even if physical addresses are non-linear, they are mapped into sequential structural order or linearized sequence so that a linearized frame is provided to vector processor to execute vector instruction].
Ansari does not explicitly disclose an address sequence memory to store non-linear RAM addresses corresponding to linear locations in the first intermediate memory, a data sequencer or a linearized frame. Nevertheless in the same field of endeavor, Jain teaches: The main idea is to introduce an extra level of indirection to create a new structural address space in which correlated physical addresses are assigned consecutive structural addresses. The key point is that in this structural address space, streams of correlated memory addresses are both temporally ordered and spatially ordered… Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB – Jain p.248, left col, fig 2, 5-7; The ISB uses two on-chip caches to maintain the mapping between physical and structural addresses. The Physical to-Structural AMC (PS-AMC) stores the mapping from the physical address space to the structural address space; it is indexed by physical addresses. The Structural-to-Physical AMC(SP-AMC) stores the inverse mapping as the PS-AMC and is indexed by structural addresses. While the SP-AMC is not strictly necessary, it enables efficient temporal stream prediction because each cache line in the SP-AMC can yield in a single lookup 16 prefetch candidates from the current temporal stream - Jain p.251, left col - Address Mapping Caches (AMCs). The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph. We see in Figure 2 that a sequential traversal of the structural address space visits the elements of the irregular temporal stream—A, B, C, D and E—in temporal order. Thus, the problem of prefetching irregular streams is reduced to sequential prefetching in the structural address space. The mapping to and from structural addresses is performed at a cache line granularity by two spatially indexed on-chip address caches whose contents can be easily synchronized with that of the TLB - Jain p.248, left col, fig 2
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement a linearized memory, sequencer and linearized frame in the invention of Ansari as taught by Jain because it would be advantageous for improving prefetching for irregular/non-linear access patterns (Jain p.247 & 248, left col, also p251 left col).
Ansari/Jain does not explicitly disclose a scalar processor, although Ansari suggests this feature in col 1 lines 15-18. Nevertheless in the same field of endeavor, Azadet teaches a digital signal processors DSPs having vector and scalar architectures [Azadet 0023-0025], so that the combination of Ansari/Jain/Azadet discloses a scalar processor communicatively coupled to the RAM [Azadet 0023-0025]; and an address learning agent to: record a first set of memory access addresses by the scalar processor for the first frame of data [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses – Jain p. 251 left col, Training unit]; and store the first set of memory access addresses in the address sequence memory [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement the Ansari/Jain architecture in a heterogenous scalar + vector configuration as taught by Azadet so that a scalar processor could run code while a vector processor performs vector operations in parallel on scalar data units, using Jains logic to learn address sequences to aid in prefetching (Azadet 0022-0024)
With respect to dependent claims 9 Ansari/Jain/Azadet discloses a second intermediate memory communicatively coupled to the vector processor and the RAM; wherein the data sequencer is to read a second frame of data from the RAM to the second intermediate memory based on addresses stored in the address sequence memory [multiple vector buffers in pool - Ansari fig 1-2, col 3 lines 5-15] [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph] [The combination of these teachings yields a sequencer/VTU that can read/fetch a first/second frame (vector segment/stream) from RAM into first/second intermediate memory (VBP) according to a stored sequence of addresses in an address sequence memory (PS-AMC/SP-AMC)].
With respect to dependent claims 10 Ansari/Jain/Azadet discloses wherein the address learning agent to: record a second set of memory access addresses by the scalar processor for the second frame of data; and determine the first set of memory access addresses matches the second set of memory access addresses [the training unit operates in a loop, continually for any number of frames of data, logging load addresses - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph]; determine the first set of memory access addresses matches the second set of memory access addresses [compare/determine newly observed addresses, disclosed as: When a correlated pair (A,B) is observed, the PS-AMC is queried to see if A and B have previously been assigned structural addresses… If A and B already have consecutive structural addresses, the ISB increments the confidence counter – Jain p. 251 both cols]; and indicate the match to the scalar processor [match indicated via combination of confidence counter & prefetch - Jain p. 251 both cols].
With respect to dependent claims 11 Ansari/Jain/Azadet discloses wherein the scalar processor to, based on the match determination, modify the program to access the first intermediate memory [The training unit takes as input the load PC and the load address, and it maintains the last observed address in each PC-localized stream. It learns pairs of correlated physical addresses and maps these to consecutive structural addresses … The stream predictor manages streams in the structural address space … The Stream Predictor predicts the next consecutive structural addresses to prefetch… The SP-AMC retrieves the physical addresses for each of the predicted structural addresses to prefetch - Jain fig 5-7, p.251, left col – Training Unit, Address Mapping Caches (AMCs), & Stream Predictor, p.252 left col first paragraph].
With respect to dependent claims 12 Ansari/Jain/Azadet discloses a second intermediate memory to store results generated by the vector processor; and the data sequencer to write the results of the vector processor from the second intermediate memory to the RAM based on RAM addresses stored in the address sequence memory [Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15; VTU and associated control logic function as data sequencer operable to read/access data from the RAM to first intermediate memory (VBP) based on addresses stored in the address sequence memory (PS-AMC/SP-AMC) – Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18 in view of Jain fig 5-7, p.251, left col].
With respect to dependent claims 13 Ansari/Jain/Azadet discloses wherein the address sequence memory and the first intermediate memory are external to the RAM [Ansari fig 1-2].
With respect to dependent claims 14 Ansari/Jain/Azadet discloses wherein the vector processor supports an instruction to access data from the first intermediate memory [Data is transferred into and out of the VBP using special vector data instructions. One set of instructions perform the transfer of data between the memory and the vector buffers. Another pair of instructions move the data between the vector buffers and the general-purpose registers (both integer and floating-point registers). The processor uses the vector data directly from the registers - Ansari col 3 lines 9-15; VTU and associated control logic function as data sequencer operable to read/access data from the RAM to first intermediate memory (VBP) based on addresses stored in the address sequence memory (PS-AMC/SP-AMC) – Ansari fig 1-2, col 4 lines 55-63; col 13 lines 12-18 in view of Jain fig 5-7, p.251, left col].
Claims 7, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ansari/Jain further in view of Mowry (US Patent # 6240488).
With respect to dependent claims 7, 20 Ansari/Jain does not explicitly disclose wherein the vector processor instruction set includes an indicator to fetch data from the first intermediate memory instead of the RAM. Nevertheless in the same field of endeavor, Mowry teaches: A processor capable of executing prefetching instructions containing hint fields is provided. The hint fields contain a first portion which enables the selection of a destination indicator for refill operations, and a second portion which identifies a destination [Mowry abstract], so that the combination of Ansari/Jain/Mowry discloses wherein the vector processor instruction set [vector data instructions – Ansari – col 3 lines 5-15] includes an indicator to fetch data from the first intermediate memory instead of the RAM [prefetch hint indicates where to fetch data from - Mowry abstract]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement an indicator for guiding where to fetch data from in the invention of Ansari/Jain as taught by Mowry because it would be advantageous for tailoring prefetch operations to accommodate certain types of data held in cache memories (Mowry col 1 lines 20-25).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Porpodas US Patent # 12164430 teaches An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARWAN AYASH whose telephone number is (571)270-1179. The examiner can normally be reached 9a-530p M-R.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rocio del Mar Perez-Velez can be reached on 571-270-5935. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Marwan Ayash/Examiner, Art Unit 2133
/ROCIO DEL MAR PEREZ-VELEZ/Supervisory Patent Examiner, Art Unit 2133