Office Action Analysis: 17691609 — CACHE PREFETCH FOR NEURAL PROCESSOR CIRCUIT

Office Action

§103
DETAILED ACTION
	Claims 1-20 are presented for examination.
	This office action is in response to amendment of application on 27-JAN-2026.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see pages 9-10, filed 27-JAN-2026, with respect to objections to the specification and claims have been fully considered and are persuasive due to amendments.  The objections to the specification and claims have been withdrawn. 

Applicant’s arguments, see pages 10-13, filed 27-JAN-2026, with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. 103 have been fully considered and are persuasive due to amendments.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of newly found prior art references.
1. With respect to Applicant’s arguments that the prior art is silent on the various in-memory computation or prefetching operations being performed at particular sets of cycles, and thereby failing to teach or suggest the features related to the first set of operating cycles, Examiner respectfully disagrees. Examiner notes that claim 1 does not contain any limitation about clock cycles in particular, but instead limits to a “first set of operating cycles” and a “second set of operating cycles. For example, in Mathuriya [0064], Mathuriya teaches that contemporaneously with the computation of an nth layer, causing a prefetch from the system memory of at least a portion of the n+1st layer. These correspond to the performing of convolution on the first input data in the first set of operating cycles, and the instructing the cache circuit to prefetch during the first set of operating cycles corresponding to the first task, a portion of second input data of the second task. While not describing every single limitation, Mathuriya teaches the argued computation and prefetching operations being performed at an nth layer, which is interpreted as a first set of cycles. Nevertheless, claim 1 has been amended, and the rejection is updated below to address the changes.
2. With respect to Applicant’s arguments that the prior art does not teach a fixed relationship between the sizes of the first input data, the second input data, and the bandwidth of the system memory, Examiner notes that no such fixed relationship is claimed. The amended claim 1 broadly recites that the first input data has a size less than an undefined bandwidth of the system memory, and a portion of the second input data uses any amount of the remaining bandwidth of the system memory, and therefore the broadest reasonable interpretation includes any teaching where a first data in a cache may be any size below any amount of the bandwidth, and the second data takes up any amount of the remaining bandwidth. The original and amended claim 3 by contrast requires only that fetching the first data is any size below the full bandwidth, and that prefetching the second data takes any amount of the bandwidth at all, which differs from the amendment to claim 1, so the arguments for claim 3 do not apply to the amended claim 1. In the previous mapping for claim 3, the teachings of Sztejna are merely to show an explicit case where any fetching is necessarily less than the total bandwidth, by virtue of limiting it to be so, and reads on the claimed fetching first data being less than a full bandwidth, and the second fetching being some amount of bandwidth. To address the slight difference in amended claim 1, the rejection is updated below.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 7, 9, 12-15, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over
Mathuriya et al., U.S. Pub. No. 20190057300 (hereinafter “Mathuriya”) in view of
Vantrease et al., U.S. Pub. No. 20190294968 (hereinafter “Vantrease”) further in view of
SZTEJNA et al., U.S. Pub. No. 20220058062 (hereinafter “Sztejna”) further in view of
HUR et al., U.S. Pub. No. 20170091092 (hereinafter “Hur”).

Regarding claim 1: Mathuriya teaches A neural processor circuit, comprising:
a system memory access circuit coupled to a system memory, the system memory access circuit configured to ([0042], Mathuriya teaches a system memory circuitry 170 that may include a variety of types of storage devices.)
fetch, from the system memory, first input data of a first task associated with a neural network, the first input data being fetched in a first set of operating cycles; ([0031], Mathuriya teaches that the system memory may store information and/or data such as neural network layer data transferred between the memory and processors, and that the second on-chip processor memory circuitry can be used to store weights transferred from the system memory circuitry before being used in the neural network. Moreover, in [0024] Mathuriya teaches receiving weight data for the layers, as well as specifically an nth layer. The receiving of the weight data, which may be for the nth layer, is interpreted to be the claimed fetch, from system memory, first input data of a first task associated with a neural network. Furthermore, the description of the computations being done per layer is interpreted to be per set of operating cycles.)
 one or more neural engine circuits coupled to the system memory access circuit, the one or more neural engine circuits configured to perform neural network computation operations on the first input data in the first set of operating cycles; and ([0024], Mathuriya teaches that after receiving data representative of weights associated with one layer, the circuitry performs an in-memory computation of the “nth” layer of the neural network. Further, in [0031], Mathuriya teaches that a neural network control circuitry on the semiconductor package 110, which receives the data from the system memory circuitry, may cause or control the execution of a neural network. The neural network control circuitry corresponds to the claimed neural engine circuit.)
a cache access circuit coupled to a cache circuit that caches data to or from the system memory, the cache access circuit configured to instruct the cache circuit to prefetch from the system memory, during the first set of operating cycles corresponding to the first task, a portion of second input data of a second task of the neural network scheduled for processing in a second set of operating cycles subsequent to the first set of operating cycles, the portion of the second input data being fetched during the first set of operating cycles… in response to the first input data being fetched in the first set of operating cycles ([0064-0065], Mathuriya teaches that contemporaneous with the execution of the computation of an nth layer, prefetching from the system memory circuitry of at least a portion of the layer weights associated with the (n+1)st layer of the neural network. Furthermore, in [0028], Mathuriya teaches that the prefetched weights of the (n+1)st layer are stored in the processor cache circuitry. Furthermore, in [0034], Mathuriya teaches the sequence of execution of the system involves first fetching data for the (n+1)st layer during the execution of the nth layer, and that the (n+1)st layer will subsequently be executed using the prefetched data. In that same section, Mathuriya also teaches that at least one of the processor circuitry and/or the neural network control circuitry causes the prefetch, which would be to the cache circuitry, thus, the claimed cache access circuit coupled to a cache circuit that caches data to or from the system memory is taught. The weights of the (n+1)st layer are interpreted to be the claimed portion of second input data of a second task of a second task of the neural network, prefetched during the first set of operating cycles corresponding to the first task, and which is used for processing in a second set of operating cycles.)
Mathuriya does not appear to explicitly disclose perform convolution operations on input data, a first size less than a bandwidth of the system memory, a portion of the second input data being fetched… using a remaining bandwidth of the system memory, the second input data having a size larger than the bandwidth of the system memory.
However, Vantrease teaches perform convolution operations on the first input data ([0027] and [0031], Vantrease teaches that a prediction model may be a convolutional neural network, in which each processing of a layer generates convolutions between pixel values and weight values. The pixel values and weight values processed at a layer correspond to the input data of the first layer).
Mathuriya and Vantrease are analogous art because they are from the same field of endeavor, memory management for neural network operations.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya and Vantrease to achieve the claimed neural processor circuit that fetches first input data of a first task, and performing convolution operations on the first input data and prefetching second input data of a second task  during a first set of operating cycles, where the second task is scheduled for processing in a second set of operating cycles subsequent to the first set of operating cycles.
	One of ordinary skill in the art would have been motivated to make this modification in order to properly enable prefetching for a convolution task, by decreasing the time required to transfer layer weights when the (n+1)st layer executes as discussed in Mathuriya [0189] for neural networks of a convolutional type.
	Mathuriya/Vantrease do not appear to explicitly disclose a first size less than a bandwidth of the system memory, a portion of the second input data being fetched… using a remaining bandwidth of the system memory, the second input data having a size larger than the bandwidth of the system memory. 
However, Sztejna teaches a first size less than a bandwidth of the system memory ([0019], Sztejna teaches a system which modifies operations of system resources depending on the class of service, including increasing or decreasing cache and memory bandwidth. Decreasing memory bandwidth for operations would include decreasing memory bandwidth usage for fetching data from the system memory, and a situation where fetching data from a system memory uses less than a bandwidth of the system memory is obvious by the fact that a specifically decreased memory bandwidth usage for a fetch operation is necessarily less than a bandwidth of the system memory).
Mathuriya/Vantrease and Sztejna are analogous art because they are from the same field of endeavor, management of operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease and Sztejna, to achieve the claimed neural processor circuit, where a first input data for a first task is fetched, and a second input data is prefetched, to specify that fetching the first input data from the system memory uses less than a bandwidth of the system memory.
	One of ordinary skill in the art would have been motivated to make this modification in order to execute requests with specific adjustments to system resource usage, to enable potential workload performance improvements as discussed in Sztejna [0007].
Although bandwidth is not explicitly mentioned in Mathuriya/Vantrease, with the bandwidths of Sztejna in mind, the teachings from Mathuriya/Vantrease/Sztejna further render obvious the portion of second input data being fetched using a remaining bandwidth of the system memory, since the portion of the second input data is taught to be fetched, and is shown to be successful, the fetching necessarily uses whatever bandwidth of the system memory is available (ie., remaining bandwidth).
Mathuriya/Vantrease/Sztejna do not appear to explicitly disclose the second input data having a size larger than the bandwidth of the system memory. 
	However, Hur teaches input data having a size larger than the bandwidth of the system memory ([0056], Hur teaches a case when a cache line (input data with respect to the destination) is greater than a bandwidth between two elements, determined during a write request. Although not exactly like fetching to a cache, a write to the memory is analogous to fetching as another form of transmitting data from one element to another for storage in a memory element, like fetching is.)
Mathuriya/Vantrease/Sztejna and Hur are analogous art because they are from the same field of endeavor, management of operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna and Hur, to achieve the result in which the portion of the second data has a size larger than the bandwidth of the system memory.
	One of ordinary skill in the art would have been motivated to make this modification in order to identify and adapt to cases in which the write data would otherwise be too large to be supported by the bandwidth of the system, as discussed in Hur [0056].


	Regarding claim 2: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 1, from which claim 2 depends.
	Mathuriya/Vantrease/Sztejna/Hur further teaches the first task corresponds to a first operation in a first layer of the neural network and the second task corresponds to a second operation in a second layer of the neural network, the second layer being different from the first layer. ([0024], Mathuriya teaches that the performing of the computation/execution is of an nth layer of a multi-layer neural network, and treats the (n+1)st layer distinctly from the nth layer. The computation/execution being of an nth layer corresponds to the first task corresponding to a first operation in a first layer. As explained with respect to claim 1, the (n+1)st layer is a next layer, which has its own second task corresponding to a second operation.)

Regarding claim 3: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 1, from which claim 2 depends.
	Mathuriya/Vantrease/Sztejna/Hur further teaches fetching data from the system memory uses less than a bandwidth of the system memory and prefetching data from the system memory to the cache circuit uses at least part of the bandwidth of the system memory ([0019], Sztejna teaches a system which modifies operations of system resources depending on the class of service, including increasing or decreasing cache and memory bandwidth. Decreasing memory bandwidth for operations would include decreasing memory bandwidth usage for fetching data from the system memory, and a situation where fetching data from a system memory uses less than a bandwidth of the system memory is obvious. Furthermore, in [0021-0022], Sztejna broadly teaches that there are situations where memory bandwidth is increased, and therefore, prefetching data (which is a type of fetch) from system memory would use a specified amount of memory bandwidth, and prefetching data from the system memory to the cache circuit using at least part of the bandwidth of the system memory is taught.).
	One of ordinary skill in the art would have been motivated to make this modification for the same reasons as in claim 1.

	Regarding claim 4: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 1, from which claim 4 depends.
	Mathuriya/Vantrease/Sztejna/Hur further teaches the second input data prefetched to the cache circuit remains unconsumed in the cache circuit until the second set of operating cycles. ([0047], Vantrease teaches that performing the arithmetic operations for the next neural network layer is only done in the next time period. As explained with respect to claim 1, the next neural network layer contains the second input data prefetched, and the next time period represents the second set of operating cycles. Therefore, given that Mathuriya/Vantrease teach that the arithmetic operations are only performed for the second input data in the second set of operating cycles, the data prefetched remains unconsumed in the state buffer of Vantrease or the cache circuit of Mathuriya until the second set of operating cycles)
	One of ordinary skill in the art would have been motivated to make this modification for the same reasons as claim 1. 

Regarding claim 7: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 1, from which claim 7 depends. 
Mathuriya/Vantrease/Sztejna/Hur further teaches a first portion of the second input data is prefetched to the cache circuit and the system memory access circuit is further configured to fetch a second portion of the second input data from the system memory during the second set of operating cycles. ([0056-0057], Hur teaches the when a cache line to be written to storage is greater than a bandwidth, the write data is divided into multiple data segments to fit the bandwidth, and each segment is sequentially transferred. In combination with the teachings of Mathuriya/Vantrease of the second input data being prefetched to the cache circuit, a situation where when the input data is too large for the bandwidth, a first segment of the second input data is prefetched in one set of operating cycles, and then in a subsequent set of operating cycles, other segments of the second input data are fetched in sequence is obvious.)
	One of ordinary skill in the art would have been motivated to make this modification for the same reasons as claim 1.

	Regarding claim 9: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 1, from which claim 9 depends.
	Mathuriya/Vantrease further teaches the cache circuit is shared by one or more processing circuits external to the neural processor circuit for caching data. ([0171], Mathuriya teaches that a shared cache may be outside of processors, yet connected to processors, to store the processors’ local cache information, hence, the cache circuit being shared by one or more processing circuits external to the neural processing circuit is taught.) 

Regarding claim 12: Mathuriya teaches A method comprising:
fetching, by a system memory access circuit from the system memory, first input data of a first task associated with a neural network, the first input data being fetched in a first set of operating cycles; ([0031], Mathuriya teaches that the system memory may store information and/or data such as neural network layer data transferred between the memory and processors, and that the second on-chip processor memory circuitry can be used to store weights transferred from the system memory circuitry before being used in the neural network. Moreover, in [0024] Mathuriya teaches receiving weight data for the layers, as well as specifically an nth layer. The receiving of the weight data, which may be for the nth layer, is interpreted to be the claimed fetch, from system memory, first input data of a first task associated with a neural network. Furthermore, the description of the computations being done per layer is interpreted to be per set of operating cycles.)
 performing, by one or more neural engine circuits, neural network computation operations on the first input data in the first set of operating cycles; and ([0024], Mathuriya teaches that after receiving data representative of weights associated with one layer, the circuitry performs an in-memory computation of the “nth” layer of the neural network. Further, in [0031], Mathuriya teaches that a neural network control circuitry on the semiconductor package 110, which receives the data from the system memory circuitry, may cause or control the execution of a neural network. The neural network control circuitry corresponds to the claimed neural engine circuit.)
instructing, a cache access circuit coupled to a cache circuit that caches data to or from the system memory, the cache circuit to prefetch from the system memory, during the first set of operating cycles corresponding to the first task, a portion of second input data of a second task of the neural network scheduled for processing in a second set of operating cycles subsequent to the first set of operating cycles, the portion of the second input data being fetched during the first set of operating cycles… in response to the first input data being fetched in the first set of operating cycles ([0064-0065], Mathuriya teaches that contemporaneous with the execution of the computation of an nth layer, prefetching from the system memory circuitry of at least a portion of the layer weights associated with the (n+1)st layer of the neural network. Furthermore, in [0028], Mathuriya teaches that the prefetched weights of the (n+1)st layer are stored in the processor cache circuitry. Furthermore, in [0034], Mathuriya teaches the sequence of execution of the system involves first fetching data for the (n+1)st layer during the execution of the nth layer, and that the (n+1)st layer will subsequently be executed using the prefetched data. In that same section, Mathuriya also teaches that at least one of the processor circuitry and/or the neural network control circuitry causes the prefetch, which would be to the cache circuitry, thus, the claimed cache access circuit coupled to a cache circuit that caches data to or from the system memory is taught. The weights of the (n+1)st layer are interpreted to be the claimed portion of second input data of a second task of a second task of the neural network, prefetched during the first set of operating cycles corresponding to the first task, and which is used for processing in a second set of operating cycles.)
Mathuriya does not appear to explicitly disclose perform convolution operations on input data, a first size less than a bandwidth of the system memory, a portion of the second input data being fetched… using a remaining bandwidth of the system memory, the second input data having a size larger than the bandwidth of the system memory.
However, Vantrease teaches perform convolution operations on the first input data ([0027] and [0031], Vantrease teaches that a prediction model may be a convolutional neural network, in which each processing of a layer generates convolutions between pixel values and weight values. The pixel values and weight values processed at a layer correspond to the input data of the first layer).
Mathuriya and Vantrease are analogous art because they are from the same field of endeavor, memory management for neural network operations.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya and Vantrease to achieve the claimed neural processor circuit that fetches first input data of a first task, and performing convolution operations on the first input data and prefetching second input data of a second task  during a first set of operating cycles, where the second task is scheduled for processing in a second set of operating cycles subsequent to the first set of operating cycles.
	One of ordinary skill in the art would have been motivated to make this modification in order to properly enable prefetching for a convolution task, by decreasing the time required to transfer layer weights when the (n+1)st layer executes as discussed in Mathuriya [0189] for neural networks of a convolutional type.
	Mathuriya/Vantrease do not appear to explicitly disclose a first size less than a bandwidth of the system memory, a portion of the second input data being fetched… using a remaining bandwidth of the system memory, the second input data having a size larger than the bandwidth of the system memory. 
However, Sztejna teaches a first size less than a bandwidth of the system memory ([0019], Sztejna teaches a system which modifies operations of system resources depending on the class of service, including increasing or decreasing cache and memory bandwidth. Decreasing memory bandwidth for operations would include decreasing memory bandwidth usage for fetching data from the system memory, and a situation where fetching data from a system memory uses less than a bandwidth of the system memory is obvious by the fact that a specifically decreased memory bandwidth usage for a fetch operation is necessarily less than a bandwidth of the system memory).
Mathuriya/Vantrease and Sztejna are analogous art because they are from the same field of endeavor, management of operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease and Sztejna, to achieve the claimed neural processor circuit, where a first input data for a first task is fetched, and a second input data is prefetched, to specify that fetching the first input data from the system memory uses less than a bandwidth of the system memory.
	One of ordinary skill in the art would have been motivated to make this modification in order to execute requests with specific adjustments to system resource usage, to enable potential workload performance improvements as discussed in Sztejna [0007].
Although bandwidth is not explicitly mentioned in Mathuriya/Vantrease, with the bandwidths of Sztejna in mind, the teachings from Mathuriya/Vantrease/Sztejna further render obvious the portion of second input data being fetched using a remaining bandwidth of the system memory, since the portion of the second input data is taught to be fetched, and is shown to be successful, the fetching necessarily uses whatever bandwidth of the system memory is available (ie., remaining bandwidth).
Mathuriya/Vantrease/Sztejna do not appear to explicitly disclose the second input data having a size larger than the bandwidth of the system memory. 
	However, Hur teaches input data having a size larger than the bandwidth of the system memory ([0056], Hur teaches a case when a cache line (input data with respect to the destination) is greater than a bandwidth between two elements, determined during a write request. Although not exactly like fetching to a cache, a write to the memory is analogous to fetching as another form of transmitting data from one element to another for storage in a memory element, like fetching is.)
Mathuriya/Vantrease/Sztejna and Hur are analogous art because they are from the same field of endeavor, management of operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna and Hur, to achieve the result in which the portion of the second data has a size larger than the bandwidth of the system memory.
	One of ordinary skill in the art would have been motivated to make this modification in order to identify and adapt to cases in which the write data would otherwise be too large to be supported by the bandwidth of the system, as discussed in Hur [0056].

Regarding claim 13: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 12, from which claim 13 depends.
	Mathuriya/Vantrease/Sztejna/Hur further teaches the first task corresponds to a first operation in a first layer of the neural network and the second task corresponds to a second operation in a second layer of the neural network, the second layer being different from the first layer. ([0024], Mathuriya teaches that the performing of the computation/execution is of an nth layer of a multi-layer neural network, and treats the (n+1)st layer distinctly from the nth layer. The computation/execution being of an nth layer corresponds to the first task corresponding to a first operation in a first layer. As explained with respect to claim 1, the (n+1)st layer is a next layer, which has its own second task corresponding to a second operation.)

Regarding claim 14: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 12, from which claim 14 depends.
	Mathuriya/Vantrease/Sztejna/Hur do not appear to explicitly disclose fetching the first input data from the system memory uses less than a bandwidth of the system memory and prefetching the second input data from the system memory to the cache circuit uses at least part of the bandwidth of the system memory.  
However, Sztejna teaches fetching data from the system memory uses less than a bandwidth of the system memory and prefetching data from the system memory to the cache circuit uses at least part of the bandwidth of the system memory ([0019], Sztejna teaches a system which modifies operations of system resources depending on the class of service, including increasing or decreasing cache and memory bandwidth. Decreasing memory bandwidth for operations would include decreasing memory bandwidth usage for fetching data from the system memory, and a situation where fetching data from a system memory uses less than a bandwidth of the system memory is obvious. Furthermore, in [0021-0022], Sztejna broadly teaches that there are situations where memory bandwidth is increased, and therefore, prefetching data (which is a type of fetch) from system memory would use a specified amount of memory bandwidth, and prefetching data from the system memory to the cache circuit using at least part of the bandwidth of the system memory is taught.).
Mathuriya/Vantrease/Sztejna/Hur and Sztejna are analogous art because they are from the same field of endeavor, management of operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna/Hur and Sztejna, to achieve the claimed neural processor circuit of claim 12, where a first input data for a first task is fetched, and a second input data is prefetched, to specify that fetching the first input data from the system memory uses less than a bandwidth of the system memory, and to specify that the prefetching the second input data from the system memory to the cache circuit uses at least part of the bandwidth of the system memory.
	One of ordinary skill in the art would have been motivated to make this modification for the same reasons as in claim 12.

Regarding claim 15: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 12, from which claim 15 depends. 
Mathuriya/Vantrease/Sztejna/Hur further teaches a first portion of the second input data is prefetched to the cache circuit and the system memory access circuit is further configured to fetch a second portion of the second input data from the system memory during the second set of operating cycles. ([0056-0057], Hur teaches the when a cache line to be written to storage is greater than a bandwidth, the write data is divided into multiple data segments to fit the bandwidth, and each segment is sequentially transferred. In combination with the teachings of Mathuriya/Vantrease of the second input data being prefetched to the cache circuit, a situation where when the input data is too large for the bandwidth, a first segment of the second input data is prefetched in one set of operating cycles, and then in a subsequent set of operating cycles, other segments of the second input data are fetched in sequence is obvious.)
	One of ordinary skill in the art would have been motivated to make this modification for the same reasons as claim 12.

Regarding claim 19: Mathuriya teaches An electronic device, comprising:
a system memory configured to store a neural network; and ([0022], Mathuriya teaches that layer weights of a neural network are stored in system memory circuitry)
A neural processor circuit, comprising: ([0031], Mathuriya teaches a neural network control circuitry.)
a system memory access circuit coupled to a system memory, the system memory access circuit configured to ([0042], Mathuriya teaches a system memory circuitry 170 that may include a variety of types of storage devices.)
fetch, from the system memory, first input data of a first task associated with a neural network, the first input data being fetched in a first set of operating cycles; ([0031], Mathuriya teaches that the system memory may store information and/or data such as neural network layer data transferred between the memory and processors, and that the second on-chip processor memory circuitry can be used to store weights transferred from the system memory circuitry before being used in the neural network. Moreover, in [0024] Mathuriya teaches receiving weight data for the layers, as well as specifically an nth layer. The receiving of the weight data, which may be for the nth layer, is interpreted to be the claimed fetch, from system memory, first input data of a first task associated with a neural network. Furthermore, the description of the computations being done per layer is interpreted to be per set of operating cycles.)
 one or more neural engine circuits coupled to the system memory access circuit, the one or more neural engine circuits configured to perform neural network computation operations on the first input data in the first set of operating cycles; and ([0024], Mathuriya teaches that after receiving data representative of weights associated with one layer, the circuitry performs an in-memory computation of the “nth” layer of the neural network. Further, in [0031], Mathuriya teaches that a neural network control circuitry on the semiconductor package 110, which receives the data from the system memory circuitry, may cause or control the execution of a neural network. The neural network control circuitry corresponds to the claimed neural engine circuit.)
a cache access circuit coupled to a cache circuit that caches data to or from the system memory, the cache access circuit configured to instruct the cache circuit to prefetch from the system memory, during the first set of operating cycles corresponding to the first task, a portion of second input data of a second task of the neural network scheduled for processing in a second set of operating cycles subsequent to the first set of operating cycles, the portion of the second input data being fetched during the first set of operating cycles… in response to the first input data being fetched in the first set of operating cycles ([0064-0065], Mathuriya teaches that contemporaneous with the execution of the computation of an nth layer, prefetching from the system memory circuitry of at least a portion of the layer weights associated with the (n+1)st layer of the neural network. Furthermore, in [0028], Mathuriya teaches that the prefetched weights of the (n+1)st layer are stored in the processor cache circuitry. Furthermore, in [0034], Mathuriya teaches the sequence of execution of the system involves first fetching data for the (n+1)st layer during the execution of the nth layer, and that the (n+1)st layer will subsequently be executed using the prefetched data. In that same section, Mathuriya also teaches that at least one of the processor circuitry and/or the neural network control circuitry causes the prefetch, which would be to the cache circuitry, thus, the claimed cache access circuit coupled to a cache circuit that caches data to or from the system memory is taught. The weights of the (n+1)st layer are interpreted to be the claimed portion of second input data of a second task of a second task of the neural network, prefetched during the first set of operating cycles corresponding to the first task, and which is used for processing in a second set of operating cycles.)
Mathuriya does not appear to explicitly disclose perform convolution operations on input data, a first size less than a bandwidth of the system memory, a portion of the second input data being fetched… using a remaining bandwidth of the system memory, the second input data having a size larger than the bandwidth of the system memory.
However, Vantrease teaches perform convolution operations on the first input data ([0027] and [0031], Vantrease teaches that a prediction model may be a convolutional neural network, in which each processing of a layer generates convolutions between pixel values and weight values. The pixel values and weight values processed at a layer correspond to the input data of the first layer).
Mathuriya and Vantrease are analogous art because they are from the same field of endeavor, memory management for neural network operations.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya and Vantrease to achieve the claimed neural processor circuit that fetches first input data of a first task, and performing convolution operations on the first input data and prefetching second input data of a second task  during a first set of operating cycles, where the second task is scheduled for processing in a second set of operating cycles subsequent to the first set of operating cycles.
	One of ordinary skill in the art would have been motivated to make this modification in order to properly enable prefetching for a convolution task, by decreasing the time required to transfer layer weights when the (n+1)st layer executes as discussed in Mathuriya [0189] for neural networks of a convolutional type.
	Mathuriya/Vantrease do not appear to explicitly disclose a first size less than a bandwidth of the system memory, a portion of the second input data being fetched… using a remaining bandwidth of the system memory, the second input data having a size larger than the bandwidth of the system memory. 
However, Sztejna teaches a first size less than a bandwidth of the system memory ([0019], Sztejna teaches a system which modifies operations of system resources depending on the class of service, including increasing or decreasing cache and memory bandwidth. Decreasing memory bandwidth for operations would include decreasing memory bandwidth usage for fetching data from the system memory, and a situation where fetching data from a system memory uses less than a bandwidth of the system memory is obvious by the fact that a specifically decreased memory bandwidth usage for a fetch operation is necessarily less than a bandwidth of the system memory).
Mathuriya/Vantrease and Sztejna are analogous art because they are from the same field of endeavor, management of operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease and Sztejna, to achieve the claimed neural processor circuit, where a first input data for a first task is fetched, and a second input data is prefetched, to specify that fetching the first input data from the system memory uses less than a bandwidth of the system memory.
	One of ordinary skill in the art would have been motivated to make this modification in order to execute requests with specific adjustments to system resource usage, to enable potential workload performance improvements as discussed in Sztejna [0007].
Although bandwidth is not explicitly mentioned in Mathuriya/Vantrease, with the bandwidths of Sztejna in mind, the teachings from Mathuriya/Vantrease/Sztejna further render obvious the portion of second input data being fetched using a remaining bandwidth of the system memory, since the portion of the second input data is taught to be fetched, and is shown to be successful, the fetching necessarily uses whatever bandwidth of the system memory is available (ie., remaining bandwidth).
Mathuriya/Vantrease/Sztejna do not appear to explicitly disclose the second input data having a size larger than the bandwidth of the system memory. 
	However, Hur teaches input data having a size larger than the bandwidth of the system memory ([0056], Hur teaches a case when a cache line (input data with respect to the destination) is greater than a bandwidth between two elements, determined during a write request. Although not exactly like fetching to a cache, a write to the memory is analogous to fetching as another form of transmitting data from one element to another for storage in a memory element, like fetching is.)
Mathuriya/Vantrease/Sztejna and Hur are analogous art because they are from the same field of endeavor, management of operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna and Hur, to achieve the result in which the portion of the second data has a size larger than the bandwidth of the system memory.
	One of ordinary skill in the art would have been motivated to make this modification in order to identify and adapt to cases in which the write data would otherwise be too large to be supported by the bandwidth of the system, as discussed in Hur [0056].

	Regarding claim 20: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 19, from which claim 20 depends.
	Mathuriya/Vantrease/Sztejna/Hur further teaches the first task corresponds to a first operation in a first layer of the neural network and the second task corresponds to a second operation in a second layer of the neural network, the second layer being different from the first layer. ([0024], Mathuriya teaches that the performing of the computation/execution is of an nth layer of a multi-layer neural network, and treats the (n+1)st layer distinctly from the nth layer. The computation/execution being of an nth layer corresponds to the first task corresponding to a first operation in a first layer. As explained with respect to claim 1, the (n+1)st layer is a next layer, which has its own second task corresponding to a second operation.)

Claim 5, 6 is rejected under 35 U.S.C. 103 as being unpatentable over
Mathuriya et al., U.S. Pub. No. 20190057300 (hereinafter “Mathuriya”) in view of
Vantrease et al., U.S. Pub. No. 20190294968 (hereinafter “Vantrease”) further in view of
SZTEJNA et al., U.S. Pub. No. 20220058062 (hereinafter “Sztejna”) further in view of
HUR et al., U.S. Pub. No. 20170091092 (hereinafter “Hur”) further in view of
	Fishel et al., U.S. Pub. No. 20190340490 (hereinafter “Fishel”)
	
	Regarding Claim 5: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 1, from which claim 5 depends.
	While Vantrease broadly teaches memory descriptors of stored data, including the address information that would be needed to perform prefetching, Mathuriya/Vantrease/Sztejna/Hur do not appear to explicitly disclose the second task is associated with a task descriptor indicating that the second input data is to be prefetched to the cache circuit, wherein the one or more neural engine circuits of the neural processor circuit are configured perform an operation according to the task descriptor.
	However, Fishel more explicitly teaches the second task is associated with a task descriptor indicating that the second input data is to be prefetched to the cache circuit, wherein the one or more neural engine circuits of the neural processor circuit are configured perform an operation according to the task descriptor. ([0084], Fishel teaches that a neural network may be converted to a list of tasks, each task associated with a task descriptor that defines a configuration of the neural processor circuit to execute the task, and where each task may correspond with a single network layer or a portion of the network layer. Further, in [0105], Fishel teaches that when a task is selected, the task descriptor is placed into a configuration queue, and the neural processor circuit then performs a prefetch operation for input data. Therefore, the task descriptor being placed into the configuration queue to signal that the neural processor circuit may then initiate a prefetch operation, teaches the claimed task descriptor indicating that the second input data is to be prefetched to the cache circuit. The task descriptor also defining a configuration of the neural processor circuit to execute the task in addition to the task being the operations associated with a layer of the neural network as discussed with respect to claim 1, teaches the claimed performing an operation according to the task descriptor.)
Mathuriya/Vantrease/Sztejna/Hur and Fishel are analogous art because they are from the same field of endeavor, management of neural network operations with memory.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna/Hur and Fishel to achieve the result of the neural processor of claim 1, to also provide task descriptors for each task, indicating that selected future input data is to be prefetched, and where the neural processor circuit performs an operation according to the task descriptor.
	One of ordinary skill in the art would have been motivated to make this modification in order to facilitate the programming of the neural processor circuit, by providing specific data in a task descriptor defining the configuration of the neural processor circuit for the task as discussed in Fishel [0093].
	
Regarding claim 6: The combination of Mathuriya, Vantrease, Sztejna, Hur, and Fishel teaches all limitations of claim 5, from which claim 6 depends.
Mathuriya/Vantrease/Sztejna/Hur/Fishel further teaches the task descriptor is generated by a compiler that is configured to analyze the neural network and determining that a size of second input data exceeding a threshold ([0094], Fishel teaches that a task descriptor may be generated at compile time. Furthermore, in [0063], Fishel teaches that a compiler analyzes the hierarchy of the neural network to determine how input data is split into smaller data units. Furthermore, in [0073], Fishel teaches that part of determining how input data is segmented may involve a constraint that determines if tile width is over a threshold. The compiler operates at compiler time, and teaches the claimed task descriptor generated by a compiler. The compiler analyzing the hierarchy to determine how input data is split, and the segmentation of input data involving determining if input data is over a threshold, teaches the claimed compiler analyzing that a size of the input data exceeds a threshold).
One of ordinary skill in the art would have been motivated to make this modification in order to adjust to hardware constraints of the neural processor circuit as discussed in Fishel [0063].

Claims 8, 16 are rejected under 35 U.S.C. 103 as being unpatentable over
Mathuriya et al., U.S. Pub. No. 20190057300 (hereinafter “Mathuriya”) in view of
Vantrease et al., U.S. Pub. No. 20190294968 (hereinafter “Vantrease”) further in view of
SZTEJNA et al., U.S. Pub. No. 20220058062 (hereinafter “Sztejna”) further in view of
HUR et al., U.S. Pub. No. 20170091092 (hereinafter “Hur”) further in view of
Sasanka, U.S. Pub. No. 20170147496 (hereinafter “Sasanka”).

Regarding claim 8: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 7, from which claim 8 depends.
While Mathuriya/Vantrease/Sztejna/Hur teaches the prefetching the first portion to the cache circuit and fetching the second portion from the system memory, as shown with respect to claim 7, Mathuriya/Vantrease/Ingram do not appear to disclose the prefetching the first portion and fetching the second portion are controlled by one or more sieve factors in a task descriptor associated with the second task.  
However, Sasanka teaches are controlled by one or more sieve factors in a task descriptor associated with the second task ([0037], Sasanka teaches a filtering technique that enables only particular portions of a memory address space of an application may be cached in a cache memory.)
Mathuriya/Vantrease/Sztejna/Hur and Sasanka are analogous art because they are from the same field of endeavor, controlling cache circuits.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna/Hur and Sasanka, to achieve the result of a system which prefetches data for tasks, where if the input data of a second task may be split into parts which fetch and execute separately, with fetching for the second part occurring during a second set of operating cycles, to also control such fetching according to a sieve factor associated with the address space of a task.
	One of ordinary skill in the art would have been motivated to make this modification in order to respond to situations where execution of a workload may result in many misses, by preserving cache lines for the execution of an application, which may lead to performance improvements as discussed in Sasanka [0043-0044].

Regarding claim 16: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 15, from which claim 16 depends.
While Mathuriya/Vantrease/Sztejna/Hur teaches the prefetching the first portion to the cache circuit and fetching the second portion from the system memory, as shown with respect to claim 15, Mathuriya/Vantrease/Sztejna/Hur do not appear to disclose the prefetching the first portion and fetching the second portion are controlled by one or more sieve factors in a task descriptor associated with the second task.  
However, Sasanka teaches are controlled by one or more sieve factors in a task descriptor associated with the second task ([0037], Sasanka teaches a filtering technique that enables only particular portions of a memory address space of an application may be cached in a cache memory.)
Mathuriya/Vantrease/Sztejna/Hur are analogous art because they are from the same field of endeavor, controlling cache circuits.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna/Hur and Sasanka, to achieve the result of a system which prefetches data for tasks, where if the input data of a second task may be split into parts which fetch and execute separately, with fetching for the second part occurring during a second set of operating cycles, to also control such fetching according to a sieve factor associated with the address space of a task.
	One of ordinary skill in the art would have been motivated to make this modification in order to respond to situations where execution of a workload may result in many misses, by preserving cache lines for the execution of an application, which may lead to performance improvements as discussed in Sasanka [0043-0044].

Claims 10, 11, 17, 18 are rejected under 35 U.S.C. 103 as being unpatentable over
Mathuriya et al., U.S. Pub. No. 20190057300 (hereinafter “Mathuriya”) in view of
Vantrease et al., U.S. Pub. No. 20190294968 (hereinafter “Vantrease”) further in view of
SZTEJNA et al., U.S. Pub. No. 20220058062 (hereinafter “Sztejna”) further in view of
HUR et al., U.S. Pub. No. 20170091092 (hereinafter “Hur”) further in view of
Pinho et al., U.S. Pub. No. 20210149805 (hereinafter “Pinho”)

	Regarding claim 10: The combination of Mathuriya, Vantrease, Sztejna, and Hur teaches all limitations of claim 1, from which claim 10 depends.
	Mathuriya/Vantrease/Sztejna/Hur do not appear to explicitly disclose the cache access circuit is further configured to receive telemetry data indicating whether the cache circuit is available.  
	However, Pinho teaches the cache access circuit is further configured to receive telemetry data indicating whether the cache circuit is available.  ([0067-0068], Pinho teaches that a cache policy adjustment process may turn prefetching OFF for the following period, and that the selected prefetch policy is then applied to the cache. In this case, the selected prefetch policy which indicates the prefetching is off is interpreted to be the telemetry data indicating whether the cache circuit is available, and the cache access circuit receiving it is taught.)
Mathuriya/Vantrease/Sztejna/Hur and Pinho are analogous art because they are from the same field of endeavor, cache prefetching techniques.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna/Hur and Pinho, to achieve the result of the neural processor circuit of claim 1, where the cache circuit receives data indicating whether the cache circuit is available. 
	One of ordinary skill in the art would have been motivated to make this modification in order to prevent cache pollution according to how much pollution will occur in the cache as discussed in Pinho [0034].

	Regarding claim 11: The combination of Mathuriya, Vantrease, Sztejna, Hur, and Pinho teaches all limitations of claim 10, from which claim 11 depends.
	Mathuriya/Vantrease/Sztejna/Hur/Pinho further teaches responsive to receiving the telemetry data indicating that the cache circuit is unavailable, backing off from instructing the cache circuit to perform a prefetching operation until a period of time has elapsed.  ([0068], Pinho teaches that the on-off decision for enabling or disabling prefetching may last for a predetermined period t, which may be a time period. This means that when the prefetch policy applied to the cache says to turn prefetching off, the prefetching is off until a period of time has elapsed.)
One of ordinary skill in the art would have been motivated to make this modification for the same reason as in claim 10.

Regarding claim 17: The combination of Mathuriya, Vantrease, Sztejna, Hur teaches all limitations of claim 12, from which claim 17 depends.
	Mathuriya/Vantrease/Sztejna/Hur do not appear to explicitly disclose receiving telemetry data indicating whether the cache circuit is available.  
	However, Pinho teaches receiving telemetry data indicating whether the cache circuit is available. ([0067-0068], Pinho teaches that a cache policy adjustment process may turn prefetching OFF for the following period, and that the selected prefetch policy is then applied to the cache. In this case, the selected prefetch policy which indicates the prefetching is off is interpreted to be the telemetry data indicating whether the cache circuit is available.)
Mathuriya/Vantrease/Sztejna/Hur and Pinho are analogous art because they are from the same field of endeavor, cache prefetching techniques.
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Mathuriya/Vantrease/Sztejna/Hur and Pinho, to achieve the result of the method of claim 12, where the cache circuit receives data indicating whether the cache circuit is available. 
	One of ordinary skill in the art would have been motivated to make this modification in order to prevent cache pollution according to how much pollution will occur in the cache as discussed in Pinho [0034].

	Regarding claim 18: The combination of Mathuriya, Vantrease, Sztejna, Hur, and Pinho teaches all limitations of claim 17, from which claim 18 depends.
	Mathuriya/Vantrease/Sztejna/Hur/Pinho further teaches responsive to receiving the telemetry data indicating that the cache circuit is unavailable, backing off from instructing the cache circuit to perform a prefetching operation until a period of time has elapsed ([0068], Pinho teaches that the on-off decision for enabling or disabling prefetching may last for a predetermined period t, which may be a time period. This means that when the prefetch policy applied to the cache says to turn prefetching off, the prefetching is off until a period of time has elapsed.)
One of ordinary skill in the art would have been motivated to make this modification for the same reason as in claim 17.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KAITLYN HUNG PHAM whose telephone number is (571)272-6333. The examiner can normally be reached Mon-Thurs 8:00-6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rocio Del Mar Perez-Velez can be reached at 571-270-5935. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/K.H.P./Examiner, Art Unit 2133
/ROCIO DEL MAR PEREZ-VELEZ/Supervisory Patent Examiner, Art Unit 2133
Read full office action
CACHE PREFETCH FOR NEURAL PROCESSOR CIRCUIT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

CACHE PREFETCH FOR NEURAL PROCESSOR CIRCUIT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email