Last updated: April 19, 2026
Application No. 17/514,735
TECHNIQUE FOR AUTONOMOUSLY MANAGING CACHE USING MACHINE LEARNING

Non-Final OA §103
Filed
Oct 29, 2021
Examiner
HOANG, AMY P
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
3 (Non-Final)
Interview Optional

— +64.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 232 resolved cases, 2023–2026
Examiner Intelligence

HOANG, AMY P View full profile →
Grants 70% — above average
Career Allow Rate
163 granted / 232 resolved
+15.3% vs TC avg
Strong +64% interview lift
Without
With
+64.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
31 currently pending
Career history
263
Total Applications
across all art units
Statute-Specific Performance

§101
15.9%
-24.1% vs TC avg
§103
46.0%
+6.0% vs TC avg
§102
17.0%
-23.0% vs TC avg
§112
13.4%
-26.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 232 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/02/2026 has been entered.

Response to Amendment
The Amendment filed on 01/02/2026 has been entered. Claim 46 is canceled. Claims 1-45 and 47-50 remain pending in the application.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-45 and 47-50 are rejected under 35 U.S.C. 103 as being unpatentable over Gottin et al. (hereinafter Gottin), US 20210374523 A1, in view of GUPTA et al. (hereinafter GUPTA), US 20200371950 A1.

Regarding independent claim 1, Gottin teaches a method of managing a cache located on a processor of a computing system ([0001] This disclosure relates to computing systems and related devices and methods, and, more particularly, to using reinforcement learning to dynamically tune cache policy parameters; Fig. 1, 118; [0015] the storage system 100 has physical resources including a number of CPU processor cores 114, operating system 116, cache 118, and other physical resources), comprising:
training a machine learning (ML) agent to autonomously learn a cache management policy of the cache for executing a particular application on the computing system ([0024] As shown in FIG. 2, in some embodiments the storage system 100 includes a cache management system 200 configured to dynamically adjust operational parameters of the cache 118 based on reinforcement learning. The cache management system 200 includes a cache composite state generator 210 and a reinforcement learning process 220. The reinforcement learning process 220 uses information from the composite state generator 210 about the current operational conditions of the cache 118, and uses that information to set operational parameters of the cache 118 by a cache parameter adjustment module 250, such as a cache prefetch policy 230 and a cache segmentation policy 240; [0025] a method of dynamically optimizing cache policy parameters is implemented using reinforcement learning; Fig. 6, 600; [0027] One aspect of the reinforcement learning process enables the reinforcement learning process to account for the changes in the disk request patterns, e.g., due to non-stationary behavior. When a new application (i.e. a particular application) emerges, for instance, access patterns on the cache 118 change and the reinforcement learning process 220 must eventually determine that it is worth changing the parameterization of the caching policy to cope with the new patterns; [0077] a software agent 600 (i.e. a machine learning (ML) agent); [0078] the deep neural network 620 is incrementally trained using reinforcement learning to learn which action should be taken in a given observed environment state to achieve the highest reward), and said training includes using the ML agent to continuously make an incremental change to cache management policy until the particular application is executed at a stable level ([0077] FIG. 6 is a functional block diagram of a reinforcement learning process 600 connected to an environment 610, according to some embodiments. As shown in FIG. 6, in some embodiments a software agent 600 receives the observed state 605 of an environment 610 and applies the observed state as input 615 to a deep neural network 620. The deep neural network provides, as output, cache policy parameters 625 of the cache policies 230, 240 under the control of the cache parameter adjustment module 250; [0078] The cache policy parameter associated with greatest anticipated reward 630 is selected by the software agent 600 and applied as input to the environment 635. This process is described in greater detail above in connection with FIG. 5. This process iterates periodically to enable the software agent 600 to control operation of the environment, observe changes to the state of the environment, determine reward values correlating with the state of the environment and action, and take additional actions until the episode ends (a determination of YES at block 525 in FIG. 5). When the episode ends, the episode is provided as training input 640 to deep neural network 630, to enable deep neural network 630 learn the relationship between the environment state, selected action, and reward. In this manner, the deep neural network 620 is incrementally trained using reinforcement learning to learn which action should be taken in a given observed environment state to achieve the highest reward (i.e. a stable level); [0084]-[0086] FIG. 7 is a flow chart of an example method of training a DQN network to learn to dynamically tune cache policy parameters); and
deploying the policy to manage the cache ([0087] FIG. 8 is a flow chart of an example method of using a trained DQN network to dynamically tune cache policy parameters, according to some embodiments. The blocks shown in FIG. 8 are the same as the blocks shown in FIG. 5, with the exception that in FIG. 8 the software agent used in block 835 is a Deep Q Network (DQN) software agent of FIG. 6, that has been trained using the process shown in FIG. 7).
Gottin does not explicitly disclose wherein locations in a memory address space is associated with a workload of the particular application, and said training includes using the ML agent to continuously make an incremental change to current cache-residency statuses of the locations until the particular application is executed at a stable level; rewriting the cache to promote and demote locations within the cache using the incremental changes to current cache-residency statuses of the locations which allow for the particular application to be executed at the stable level.
However, in the same field of endeavor, GUPTA teaches
wherein locations in a memory address space is associated with a workload of the particular application (Fig. 2; [0027] FIG. 2 illustrates an embodiment of the local cache 200 i, such as one of the local caches 200 1, 200 2 . . . 200 n, for a CPU 114 i. A local cache 200 i may include one or more tasks 202 (i.e. a workload of the particular application) being executed by the CPU 114 i, a local queue 204 of cache segments 108 i (i.e. locations in a memory address space) obtained from the global queue 110 that are available to allocate for use by the tasks 202), and said training includes using the ML agent to continuously make an incremental change to current cache-residency statuses of the locations until the particular application is executed at a stable level ([0028] FIG. 3 illustrates an embodiment of the global queue manager cache 300 that includes a global queue manager 302 to manage access to the global queue 110; global queue management information 500 having information on management of cache segments across all local queues 204 and accesses by all of the CPUs 114 i of the global queue 110 to allocate or return cache segments 108 i; a machine learning module 304 that receives as input 306 some or all of the global queue management information 500 for all the CPUs 114 i and computes an optimum number parameter vector 308 that includes an optimum number parameter 210 for every CPU 114 i and a transfer number parameter vector 310 that includes a transfer number parameter 212 for every CPU 114 i. An allocate/demote counter 312 that indicates, for every CPU 114 i, a number of allocate/demote operations with respect to the global queue 110; [0029]-[0030] The local cache managers 208 may then use the outputted optimum number parameter 210 and transfer number parameter 212 in the vectors 308 and 310, respectively, to determine when to request more cache segments 108 i from the global queue manager 302 or when to return/demote the transfer number parameter 212 of cache segments from the local queue 204 to the global queue 110);
rewriting the cache to promote and demote locations within the cache using the incremental changes to current cache-residency statuses of the locations which allow for the particular application to be executed at the stable level ([0029]-[0030] The local cache managers 208 may then use the outputted optimum number parameter 210 and transfer number parameter 212 in the vectors 308 and 310, respectively, to determine when to request more cache segments 108 i from the global queue manager 302 or when to return/demote the transfer number parameter 212 of cache segments from the local queue 204 to the global queue 110; [0041] current global queue management information 500 is used to determine the parameters the CPUs 114 i use to determine when to allocate more cache segments from the global queue 110 and to demote and return cache segments 108 i to the global queue 110. Each CPU 114 i is provided operational parameters based on that CPUs 114 i specific operations and performance and the operations of all the CPUs 114 i with respect to the global queue 110; [0017] if the local queue has a relatively low number of cache segments needed to allocate to I/O operations, then the processing unit must obtain a lock to a global queue from which it can allocate more cache segments to the local queue. Further, if the local queue has a number of cache segments exceeding an optimum number, then the processing unit must obtain a lock on the global queue to demote cache segments from the local queue to the global queue. Because multiple processing units may be accessing the global queue to obtain and return cache segments, other processing units will experience latency delays to obtain the lock, which will introduce latency for their task processing as they wait to obtain a lock for the global queue to allocate or demote cache segments (i.e not a stable level); [0018] Described embodiments control the number of lock requests to reduce latency in obtaining a lock to the global queue by adjusting the number of cache segments transferred between the local queue and the global queue. Increasing the number of cache segments to transfer reduces lock contention by reducing the frequency at which the processing units need to request the lock to access the global queue; [0019] In described embodiments, cache segment management information related to management of segments in the local queues and accesses to the global queue to transfer cache segments between the local queues and the global queue is provided to a machine learning module to output an optimum number parameter comprising an optimum number of segments to maintain in a local queue and a transfer number parameter comprising a number of cache segments to move between a local queue and the global queue. The optimum number parameters and the transfer number parameters are sent to the processing units to use to transfer the transfer number parameter of cache segments from the local queue to the global queue in response to determining that a number of segments in the local queue exceeds the optimum number parameter and to transfer the transfer number parameter of cache segments from the global queue to the local queue in response to determining that a number of segments in the local queue is less than the optimum number parameter).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of managing a global queue of cache segments for processing units by optimizing processing units operations with respect to their local queues in a manner that maintains a sufficient number of cache segments in the local queue to minimize or reduce the need for the processing unit to access the global queue to access or return resources by using a machine learning module as suggested in GUPTA into Gottin’s system because both of these systems are addressing training a machine learning module to manage cache policy. This modification would have been motivated by the desire to provide improved techniques to manage the provisioning of cache segments from a global queue to the local queues of processors to use for I/O operations (GUPTA, [0004]).

Regarding dependent claim 2, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein the ML agent is implemented as a multi-layer perceptron (MLP) network with two hidden layers ([0077] FIG. 6 is a functional block diagram of a reinforcement learning process 600 connected to an environment 610, according to some embodiments. As shown in FIG. 6, in some embodiments a software agent 600 receives the observed state 605 of an environment 610 and applies the observed state as input 615 to a deep neural network 620; [0041] the term “DNN (Deep Neural Network)” is used to refer to an artificial neural network with multiple layers between the input and output layers).

Regarding dependent claim 3, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein said training and said deploying are performed using an application programming interface ([0051] Reinforcement learning algorithms aim at controlling software agents to perform actions in an environment to maximize some notion of cumulative reward. Reinforcement learning is also called approximate dynamic programming, or neuro-dynamic programming).

Regarding dependent claim 4, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein said training is performed before executing the particular application ([0004] In some embodiments, a method of dynamically tuning cache policy parameters includes parameterizing caching systems using reinforcement learning. Baseline metrics are precomputed and various approaches at cache policy parameterization are compared against the baseline metrics. For example, if the baseline metric is based on a cache hit rate, the various alternative cache parameterization approaches are compared against the baseline hit rate rather than learning the hit rate directly).

Regarding dependent claim 5, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein said training is performed while executing the particular application ([0027] One aspect of the reinforcement learning process enables the reinforcement learning process to account for the changes in the disk request patterns, e.g., due to non-stationary behavior. When a new application emerges, for instance, access patterns on the cache 118 change and the reinforcement learning process 220 must eventually determine that it is worth changing the parameterization of the caching policy to cope with the new patterns. As an illustrative example, such patterns may involve more sequential accesses to contiguous disk addresses than the previous observations, requiring larger prefetches).

Regarding dependent claim 6, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 5 that is incorporated. GUPTA further teaches wherein said training is performed by the processor (FIG. 3 illustrates an embodiment of the global queue manager cache 300 that includes a global queue manager 302 to manage access to the global queue 110; global queue management information 500 having information on management of cache segments across all local queues 204 and accesses by all of the CPUs 114 i of the global queue 110 to allocate or return cache segments 108 i; a machine learning module 304 that receives as input 306 some or all of the global queue management information 500 for all the CPUs 114 i and computes an optimum number parameter vector 308 that includes an optimum number parameter 210 for every CPU 114 i and a transfer number parameter vector 310 that includes a transfer number parameter 212 for every CPU 114 I; Fig. 7, 710; The global queue management information 500, with information on allocate/deallocate operations and access to the global queue 110 for all CPUs, is provided (at block 710) as input 306 to the machine learning module 304. The global queue manager 302 receives (at block 712), for each CPU 114 i, an optimum number parameter vector 308 of an optimum number parameter 210 of cache segments to maintain in a local queue 204 and a transfer number parameter vector 310 of cache segments to move between a local queue and the global queue 110) and said deploying is performed by other processors in the computing system (Fig. 2; [0027] an optimum number parameter 210 comprising an optimum number of cache segments to maintain in the local queue 204 as determined by a machine learning module; and a transfer number parameter 212 comprising a number of cache segments 108 i to move between the local queue 204 and the global queue 110; [0040] The global queue manager 302 sends (at block 714) to each CPU 114 i the optimum number parameter 308 and the transfer number parameter 310 calculated specifically for that CPU 114 i).

Regarding dependent claim 7, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 5 that is incorporated. Gottin further teaches wherein said training is performed using multiple processors in the computing system including the processor, and said deploying is performed by the multiple processors ([0015] FIG. 1 is a functional block diagram of an example storage system 100, in which data clients 110 have access to storage resources provided by a storage array 112. As shown in FIG. 1, in some embodiments the storage system 100 has physical resources including a number of CPU processor cores 114).

Regarding dependent claim 8, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 5 that is incorporated. Gottin further teaches wherein said training and said deploying are performed by the processor ([0001] This disclosure relates to computing systems and related devices and methods, and, more particularly, to using reinforcement learning to dynamically tune cache policy parameters; Fig. 1, 118; [0015] the storage system 100 has physical resources including a number of CPU processor cores 114, operating system 116, cache 118, and other physical resources).

Regarding dependent claim 9, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein the ML agent is a reinforcement learning (RL) agent in an RL environment ([0077] FIG. 6 is a functional block diagram of a reinforcement learning process 600 connected to an environment 610, according to some embodiments. As shown in FIG. 6, in some embodiments a software agent 600 receives the observed state 605 of an environment 610 and applies the observed state as input 615 to a deep neural network 620. The deep neural network provides, as output, cache policy parameters 625 of the cache policies 230, 240 under the control of the cache parameter adjustment module 250).

Regarding dependent claim 10, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein the particular application is a deep learning (DL) application for training or inferencing, and the locations in the memory address space represent virtual address ranges for the workload (Fig. 1; [0018] In some embodiments, data clients 110 execute in emulations 120 such as a virtual machine instantiated in the context of the storage system 100. In some embodiments, a hypervisor 122 abstracts the physical resources of the storage system 100 from emulations 120, and allocates physical resources of storage system 100 for use by the emulations 120. Each emulation 120 has an emulation operating system 124 and one or more application processes running in the context of the emulation operating system 124; [0027] One aspect of the reinforcement learning process enables the reinforcement learning process to account for the changes in the disk request patterns, e.g., due to non-stationary behavior. When a new application emerges, for instance, access patterns on the cache 118 change and the reinforcement learning process 220 must eventually determine that it is worth changing the parameterization of the caching policy to cope with the new patterns. As an illustrative example, such patterns may involve more sequential accesses to contiguous disk addresses than the previous observations, requiring larger prefetches).

Regarding dependent claim 11, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 10 that is incorporated. GUPTA further teaches wherein some of the locations that are associated with activation data of the DL application ([0030] In one embodiment, the machine learning modules 304 may comprise artificial neural network programs. Each neural network may be trained using backward propagation to adjust weights and biases (i.e. activation data) at nodes in a hidden layer to produce the computed optimum number parameter vector 308 and transfer number parameter vector 310. In backward propagation used to train a neural network machine learning module, margin of errors are determined based on operational parameters, such a margin of error of an adjusted transfer number parameter for each processing unit and a current transfer number parameter calculated for each processing unit to adjust weights and biases at nodes in a hidden layer of the machine learning module to produce the adjusted transfer number parameter).

Regarding dependent claim 12, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 10 that is incorporated. GUPTA further teaches wherein some of the locations are associated with weight data of the DL application ([0030] In one embodiment, the machine learning modules 304 may comprise artificial neural network programs. Each neural network may be trained using backward propagation to adjust weights (i.e. weight data) and biases at nodes in a hidden layer to produce the computed optimum number parameter vector 308 and transfer number parameter vector 310. In backward propagation used to train a neural network machine learning module, margin of errors are determined based on operational parameters, such a margin of error of an adjusted transfer number parameter for each processing unit and a current transfer number parameter calculated for each processing unit to adjust weights and biases at nodes in a hidden layer of the machine learning module to produce the adjusted transfer number parameter).

Regarding dependent claim 13, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein the processor is a graphics processing unit (GPU) ([0097] The methods described herein may be implemented as software configured to be executed in control logic such as contained in a Central Processing Unit (CPU) or Graphics Processing Unit (GPU) of an electronic device such as a computer).

Regarding dependent claim 14, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. GUPTA further teaches further comprising:
before said training, preparing an ML environment for the ML agent by dividing the memory address space into the locations, each location being represented by a single bit that indicates a cache-residency status of a corresponding location, and setting each single bit to zero (FIG. 4; [0035]  a role 404 of the CPU 114 i as a demoter assigned to demote cache segments 108 i from the local queue 204 to the global queue 110 and/or an allocator assigned to allocate cache segments 108 i from the global queue 110 to the local queue 204).

Regarding dependent claim 15, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 14 that is incorporated. GUPTA further teaches wherein said training includes receiving from the ML environment a state that corresponds to the current cache-residency statuses of the locations ([0040] FIG. 7 illustrates an embodiment of operations performed by the global queue manager 302 upon receiving local queue management information 400 from one of the CPUs 114 i upon performing a demoting or allocation operation with respect to the global queue 110. Upon receiving (at block 700) local queue management information 400, the global queue manager 302 increments the allocate/demote counter 312 for the CPU 114 i that sent the local queue management information 400. The global queue management information 500 is updated with the received local queue management information 400 to make current).

Regarding dependent claim 16, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 15 that is incorporated. GUPTA further teaches wherein the state is represented by an N-dimensional binary vector ([0037] a CPU roles vector 502 indicating the roles, allocator and/or demoter, for each of the CPUs 114 i).

Regarding dependent claim 17, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 15 that is incorporated. GUPTA further teaches wherein said training includes choosing an action that corresponds to the incremental change based on the state (Fig. 7; [0040] The global queue management information 500, with information on allocate/deallocate operations and access to the global queue 110 for all CPUs, is provided (at block 710) as input 306 to the machine learning module 304. The global queue manager 302 receives (at block 712), for each CPU 114 i, an optimum number parameter vector 308 of an optimum number parameter 210 of cache segments to maintain in a local queue 204 and a transfer number parameter vector 310 of cache segments to move between a local queue and the global queue 110. The global queue manager 302 sends (at block 714) to each CPU 114 i the optimum number parameter 308 and the transfer number parameter 310 calculated specifically for that CPU 114 i).

Regarding dependent claim 18, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 15 that is incorporated. GUPTA further teaches wherein the action corresponds to a promotion of one of the locations to a cache resident, a demotion of the one location to a non-cache cache resident or a no-action ([0041] current global queue management information 500 is used to determine the parameters the CPUs 114 i use to determine when to allocate more cache segments from the global queue 110 and to demote and return cache segments 108 i to the global queue 110. Each CPU 114 i is provided operational parameters based on that CPUs 114 i specific operations and performance and the operations of all the CPUs 114 i with respect to the global queue 110).

Regarding dependent claim 19, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 18 that is incorporated. GUPTA further teaches wherein the action is represented by a 2N+1 dimensional one-hot vector ([0040] The global queue manager 302 receives (at block 712), for each CPU 114 i, an optimum number parameter vector 308 of an optimum number parameter 210 of cache segments to maintain in a local queue 204 and a transfer number parameter vector 310 of cache segments to move between a local queue and the global queue 110).

Regarding dependent claim 20, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein said training includes receiving a reward that corresponds to an execution metric indicating whether the stable level has been achieved ([0051] Reinforcement learning algorithms aim at controlling software agents to perform actions in an environment to maximize some notion of cumulative reward; [0066] In the context of cache parameter optimization, an example reward can be based on the cache hit rate (i.e. an execution metric). Because the cache hit rate is heteroscedastic, in some embodiments the reward is based on a combination of both the cache hit rate and a baseline reward).

Regarding dependent claim 21, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 20 that is incorporated. Gottin further teaches wherein the execution metric corresponds to an execution time of the particular application on the computing system, an amount of traffic to a DRAM while executing the particular application on the computing system or a performance per watt of the computing system executing the particular application ([0020] When an IO request is received, the storage system 110 first tries to service the IO request from the cache 118. If the data associated with the request is stored in cache 118, the storage system 110 will be able to service the request much faster than if the data needs to be retrieved from managed drives of storage array 112. Accordingly, correctly placing data with a high probability of being requested on fast memory media implementing cache 118 can substantially reduce the response times of input/output (I/O) requests).

Regarding dependent claim 22, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches further comprising learning the cache management policy using the trained agent ([0024] As shown in FIG. 2, in some embodiments the storage system 100 includes a cache management system 200 configured to dynamically adjust operational parameters of the cache 118 based on reinforcement learning. The cache management system 200 includes a cache composite state generator 210 and a reinforcement learning process 220. The reinforcement learning process 220 uses information from the composite state generator 210 about the current operational conditions of the cache 118, and uses that information to set operational parameters of the cache 118 by a cache parameter adjustment module 250, such as a cache prefetch policy 230 and a cache segmentation policy 240; [0077] a software agent 600), GUPTA  further teaches wherein the current cache-residency statuses of the locations become final cache-residency statuses of the location when predefined time for said training and said learning runs out ([0043] The machine learning module 304 is retrained (at block 808), such as using backward propagation, with input comprising the global queue management information 500 to produce the adjusted transfer number parameter 310 i for each CPU 114 i, by using the margin of error for each CPU 114 i of the difference of the adjusted transfer number parameter 310 i and the current transfer number parameter 212; [0044] With the embodiment of FIG. 8, the transfer number parameter 310 for a CPU 114 i is reduced by a difference, or margin of error, of the optimum global lock contention and the current lock contention time for a CPU 114 i if the current lock contention time for a CPU exceeds the optimum global lock contention).

Regarding dependent claim 23, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 1 that is incorporated. Gottin further teaches wherein said training includes determining that the stable level has been achieved when a standard deviation between a predefined number of past rewards is less than a predefined threshold ([0070] in some embodiments a baseline regularized instantaneous reward r is used, which shows how much better the selected parameters performed relative to a baseline b, where the baseline b is a static value selected for the algorithm. In other embodiments, the instantaneous reward r is a function of both a baseline hit rate and an instantaneous cache hit rate).

Regarding independent claim 24, it is a product claim that corresponding to the method of claim 1. Therefore, it is rejected for the same reason as claim 1 above. Gottin further teaches a computer program product having a series of operating instructions stored on a non-transitory computer-readable medium that directs a processor of a computing system when executed thereby to perform operations ([0097] The methods described herein may be implemented as software configured to be executed in control logic such as contained in a Central Processing Unit (CPU) or Graphics Processing Unit (GPU) of an electronic device such as a computer. In particular, the functions described herein may be implemented as sets of program instructions stored on a non-transitory tangible computer readable storage medium).

Regarding independent claim 25, it is a system claim that corresponding to the method of claim 1. Therefore, it is rejected for the same reason as claim 1 above.

Regarding dependent claim 26, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. GUPTA further teaches wherein the incremental change corresponds to an action (Fig. 7; [0040] The global queue management information 500, with information on allocate/deallocate operations and access to the global queue 110 for all CPUs, is provided (at block 710) as input 306 to the machine learning module 304. The global queue manager 302 receives (at block 712), for each CPU 114 i, an optimum number parameter vector 308 of an optimum number parameter 210 of cache segments to maintain in a local queue 204 and a transfer number parameter vector 310 of cache segments to move between a local queue and the global queue 110. The global queue manager 302 sends (at block 714) to each CPU 114 i the optimum number parameter 308 and the transfer number parameter 310 calculated specifically for that CPU 114 i), and the current cache-residency statuses correspond to a state of an ML environment that the ML agent is in ([0040] FIG. 7 illustrates an embodiment of operations performed by the global queue manager 302 upon receiving local queue management information 400 from one of the CPUs 114 i upon performing a demoting or allocation operation with respect to the global queue 110. Upon receiving (at block 700) local queue management information 400, the global queue manager 302 increments the allocate/demote counter 312 for the CPU 114 i that sent the local queue management information 400. The global queue management information 500 is updated with the received local queue management information 400 to make current)).

Regarding dependent claim 27, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 26 that is incorporated. GUPTA further teaches wherein the state is represented by an N-dimensional binary vector ([0037] a CPU roles vector 502 indicating the roles, allocator and/or demoter, for each of the CPUs 114 i).

Regarding dependent claim 28, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 26 that is incorporated. GUPTA further teaches wherein the action corresponds to a promotion of one of the locations to a cache resident, a demotion of the one location to a non-cache resident or a no-action ([0041] current global queue management information 500 is used to determine the parameters the CPUs 114 i use to determine when to allocate more cache segments from the global queue 110 and to demote and return cache segments 108 i to the global queue 110. Each CPU 114 i is provided operational parameters based on that CPUs 114 i specific operations and performance and the operations of all the CPUs 114 i with respect to the global queue 110).

Regarding dependent claim 29, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 26 that is incorporated. GUPTA further teaches wherein the action is represented by a 2N+1 dimensional one-hot vector ([0040] The global queue manager 302 receives (at block 712), for each CPU 114 i, an optimum number parameter vector 308 of an optimum number parameter 210 of cache segments to maintain in a local queue 204 and a transfer number parameter vector 310 of cache segments to move between a local queue and the global queue 110).

Regarding dependent claim 30, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the ML agent receives a reward that corresponds to an execution metric indicating whether the stable level has been achieved ([0051] Reinforcement learning algorithms aim at controlling software agents to perform actions in an environment to maximize some notion of cumulative reward; [0066] In the context of cache parameter optimization, an example reward can be based on the cache hit rate (i.e. an execution metric). Because the cache hit rate is heteroscedastic, in some embodiments the reward is based on a combination of both the cache hit rate and a baseline reward).

Regarding dependent claim 31, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 30 that is incorporated. Gottin further teaches wherein the execution metric corresponds to an execution time of the particular application on the computing system, an amount of traffic to a DRAM while executing the particular application on the computing system or a performance per watt of the computing system executing the particular application ([0020] When an IO request is received, the storage system 110 first tries to service the IO request from the cache 118. If the data associated with the request is stored in cache 118, the storage system 110 will be able to service the request much faster than if the data needs to be retrieved from managed drives of storage array 112. Accordingly, correctly placing data with a high probability of being requested on fast memory media implementing cache 118 can substantially reduce the response times of input/output (I/O) requests).

Regarding dependent claim 32, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the trained agent learns the cache management policy ([0024] As shown in FIG. 2, in some embodiments the storage system 100 includes a cache management system 200 configured to dynamically adjust operational parameters of the cache 118 based on reinforcement learning. The cache management system 200 includes a cache composite state generator 210 and a reinforcement learning process 220. The reinforcement learning process 220 uses information from the composite state generator 210 about the current operational conditions of the cache 118, and uses that information to set operational parameters of the cache 118 by a cache parameter adjustment module 250, such as a cache prefetch policy 230 and a cache segmentation policy 240; [0077] a software agent 600), GUPTA  further teaches wherein the current cache-residency statuses of the locations become final cache-residency statuses of the location when predefined time for said training and said learning runs out ([0043] The machine learning module 304 is retrained (at block 808), such as using backward propagation, with input comprising the global queue management information 500 to produce the adjusted transfer number parameter 310 i for each CPU 114 i, by using the margin of error for each CPU 114 i of the difference of the adjusted transfer number parameter 310 i and the current transfer number parameter 212; [0044] With the embodiment of FIG. 8, the transfer number parameter 310 for a CPU 114 i is reduced by a difference, or margin of error, of the optimum global lock contention and the current lock contention time for a CPU 114 i if the current lock contention time for a CPU exceeds the optimum global lock contention).

Regarding dependent claim 33, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the ML agent determines that the stable level has been achieved when a standard deviation between a predefined number of past rewards is less than a predefined threshold ([0070] in some embodiments a baseline regularized instantaneous reward r is used, which shows how much better the selected parameters performed relative to a baseline b, where the baseline b is a static value selected for the algorithm. In other embodiments, the instantaneous reward r is a function of both a baseline hit rate and an instantaneous cache hit rate).

Regarding dependent claim 34, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the processor deploys the policy using an application programming interface ([0051] Reinforcement learning algorithms aim at controlling software agents to perform actions in an environment to maximize some notion of cumulative reward. Reinforcement learning is also called approximate dynamic programming, or neuro-dynamic programming).

Regarding dependent claim 35, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the ML agent is trained using an application programming interface ([0051] Reinforcement learning algorithms aim at controlling software agents to perform actions in an environment to maximize some notion of cumulative reward. Reinforcement learning is also called approximate dynamic programming, or neuro-dynamic programming).

Regarding dependent claim 36, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the ML agent is a multi-layer perceptron (MLP) network with two hidden layers ([0077] FIG. 6 is a functional block diagram of a reinforcement learning process 600 connected to an environment 610, according to some embodiments. As shown in FIG. 6, in some embodiments a software agent 600 receives the observed state 605 of an environment 610 and applies the observed state as input 615 to a deep neural network 620; [0041] the term “DNN (Deep Neural Network)” is used to refer to an artificial neural network with multiple layers between the input and output layers).

Regarding dependent claim 37, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the ML agent is a reinforcement learning (RL) agent in an RL environment ([0077] FIG. 6 is a functional block diagram of a reinforcement learning process 600 connected to an environment 610, according to some embodiments. As shown in FIG. 6, in some embodiments a software agent 600 receives the observed state 605 of an environment 610 and applies the observed state as input 615 to a deep neural network 620. The deep neural network provides, as output, cache policy parameters 625 of the cache policies 230, 240 under the control of the cache parameter adjustment module 250).

Regarding dependent claim 38, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the particular application is a deep learning (DL) application for training or inferencing, and the locations in the memory address space represent virtual address ranges for the workload (Fig. 1; [0018] In some embodiments, data clients 110 execute in emulations 120 such as a virtual machine instantiated in the context of the storage system 100. In some embodiments, a hypervisor 122 abstracts the physical resources of the storage system 100 from emulations 120, and allocates physical resources of storage system 100 for use by the emulations 120. Each emulation 120 has an emulation operating system 124 and one or more application processes running in the context of the emulation operating system 124; [0027] One aspect of the reinforcement learning process enables the reinforcement learning process to account for the changes in the disk request patterns, e.g., due to non-stationary behavior. When a new application emerges, for instance, access patterns on the cache 118 change and the reinforcement learning process 220 must eventually determine that it is worth changing the parameterization of the caching policy to cope with the new patterns. As an illustrative example, such patterns may involve more sequential accesses to contiguous disk addresses than the previous observations, requiring larger prefetches).

Regarding dependent claim 39, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 38 that is incorporated. GUPTA further teaches wherein some of the locations are associated with activation data of the DL application ([0030] In one embodiment, the machine learning modules 304 may comprise artificial neural network programs. Each neural network may be trained using backward propagation to adjust weights and biases (i.e. activation data) at nodes in a hidden layer to produce the computed optimum number parameter vector 308 and transfer number parameter vector 310. In backward propagation used to train a neural network machine learning module, margin of errors are determined based on operational parameters, such a margin of error of an adjusted transfer number parameter for each processing unit and a current transfer number parameter calculated for each processing unit to adjust weights and biases at nodes in a hidden layer of the machine learning module to produce the adjusted transfer number parameter).

Regarding dependent claim 40, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 38 that is incorporated. GUPTA further teaches wherein some of the locations are associated with weight data of the DL application ([0030] In one embodiment, the machine learning modules 304 may comprise artificial neural network programs. Each neural network may be trained using backward propagation to adjust weights (i.e. weight data) and biases at nodes in a hidden layer to produce the computed optimum number parameter vector 308 and transfer number parameter vector 310. In backward propagation used to train a neural network machine learning module, margin of errors are determined based on operational parameters, such a margin of error of an adjusted transfer number parameter for each processing unit and a current transfer number parameter calculated for each processing unit to adjust weights and biases at nodes in a hidden layer of the machine learning module to produce the adjusted transfer number parameter).

Regarding dependent claim 41, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the particular application is a high-performance computing (HPC) application ([0028] Learning the parametrization of cache policies poses several problems. For example, changes in the access patterns should be reflected in changes in caching policies. When a new application begins using the storage system, the past historical data on disk accesses will not reflect the patterns of the new application. Frequently retraining the model takes time and resources and must be done with parsimony. Additionally, changes in cache policy parameters have both short term and long-term impacts. Short term impacts are easier to predict through a model, but the long-term impacts are more challenging to predict and account for, and would potentially leverage higher gains. For example, increasing the length of the prefetch can lead to an instantaneous increase in hit rate, but can cause the eviction of a content which, in the long term, will be frequently requested and costly to retrieve again).

Regarding dependent claim 42, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the ML agent is trained to learn the cache management policy before an execution of the particular application ([0004] In some embodiments, a method of dynamically tuning cache policy parameters includes parameterizing caching systems using reinforcement learning. Baseline metrics are precomputed and various approaches at cache policy parameterization are compared against the baseline metrics. For example, if the baseline metric is based on a cache hit rate, the various alternative cache parameterization approaches are compared against the baseline hit rate rather than learning the hit rate directly).

Regarding dependent claim 43, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. Gottin further teaches wherein the processor is a GPU ([0097] The methods described herein may be implemented as software configured to be executed in control logic such as contained in a Central Processing Unit (CPU) or Graphics Processing Unit (GPU) of an electronic device such as a computer).

Regarding dependent claim 44, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 25 that is incorporated. GUPTA further teaches wherein the computing system is one of DL computing systems located in a data center ([0060] The computational components of FIG. 1 may be implemented in one or more computer systems, such as the computer system 1102 shown in FIG. 11. Computer system/server 1102 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 1102 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices).

Regarding independent claim 45, Gottin teaches a method of managing a cache located on a processor of a computing system ([0001] This disclosure relates to computing systems and related devices and methods, and, more particularly, to using reinforcement learning to dynamically tune cache policy parameters; Fig. 1, 118; [0015] the storage system 100 has physical resources including a number of CPU processor cores 114, operating system 116, cache 118, and other physical resources), comprising:
executing an application on the processor, the application having a workload which utilizes a memory address space ([0017] Data clients 110 act as hosts and provide access to the storage resources provided by storage array 112. Examples of data clients 110 may include but are not limited to file servers, email servers, block servers, and databases. The storage system 100 maintains data for the data clients 110 in storage array 112. For example, data client 110 may write data to the storage system 100 and read data from the storage system 100 in order to perform various functions); 
during said executing, allowing a machine learning agent to autonomously learn a cache management policy for the cache ([0024] As shown in FIG. 2, in some embodiments the storage system 100 includes a cache management system 200 configured to dynamically adjust operational parameters of the cache 118 based on reinforcement learning. The cache management system 200 includes a cache composite state generator 210 and a reinforcement learning process 220. The reinforcement learning process 220 uses information from the composite state generator 210 about the current operational conditions of the cache 118, and uses that information to set operational parameters of the cache 118 by a cache parameter adjustment module 250, such as a cache prefetch policy 230 and a cache segmentation policy 240; [0025] a method of dynamically optimizing cache policy parameters is implemented using reinforcement learning; Fig. 6, 600; [0027] One aspect of the reinforcement learning process enables the reinforcement learning process to account for the changes in the disk request patterns, e.g., due to non-stationary behavior. When a new application (i.e. a particular application) emerges, for instance, access patterns on the cache 118 change and the reinforcement learning process 220 must eventually determine that it is worth changing the parameterization of the caching policy to cope with the new patterns; [0077] a software agent 600 (i.e. a machine learning (ML) agent); [0078] the deep neural network 620 is incrementally trained using reinforcement learning to learn which action should be taken in a given observed environment state to achieve the highest reward) by repeatedly making incremental changes to cache management policy ([0077] FIG. 6 is a functional block diagram of a reinforcement learning process 600 connected to an environment 610, according to some embodiments. As shown in FIG. 6, in some embodiments a software agent 600 receives the observed state 605 of an environment 610 and applies the observed state as input 615 to a deep neural network 620. The deep neural network provides, as output, cache policy parameters 625 of the cache policies 230, 240 under the control of the cache parameter adjustment module 250; [0078] The cache policy parameter associated with greatest anticipated reward 630 is selected by the software agent 600 and applied as input to the environment 635. This process is described in greater detail above in connection with FIG. 5. This process iterates periodically to enable the software agent 600 to control operation of the environment, observe changes to the state of the environment, determine reward values correlating with the state of the environment and action, and take additional actions until the episode ends (a determination of YES at block 525 in FIG. 5). When the episode ends, the episode is provided as training input 640 to deep neural network 630, to enable deep neural network 630 learn the relationship between the environment state, selected action, and reward. In this manner, the deep neural network 620 is incrementally trained using reinforcement learning to learn which action should be taken in a given observed environment state to achieve the highest reward (i.e. a stable level); [0084]-[0086] FIG. 7 is a flow chart of an example method of training a DQN network to learn to dynamically tune cache policy parameters);
deploying the policy to manage the cache ([0087] FIG. 8 is a flow chart of an example method of using a trained DQN network to dynamically tune cache policy parameters, according to some embodiments. The blocks shown in FIG. 8 are the same as the blocks shown in FIG. 5, with the exception that in FIG. 8 the software agent used in block 835 is a Deep Q Network (DQN) software agent of FIG. 6, that has been trained using the process shown in FIG. 7).
Gottin does not explicitly disclose during said executing, allowing a machine learning agent to autonomously learn a cache management policy for the cache by repeatedly making incremental changes to current cache-residency statuses of locations in the memory address space; rewriting the cache to promote and demote locations within the cache using the incremental changes to current cache-residency statuses of the locations which allow for the application to be executed at a stable level.
However, in the same field of endeavor, GUPTA teaches during said executing, allowing a machine learning agent to autonomously learn a cache management policy for the cache by repeatedly making incremental changes to current cache-residency statuses of locations in the memory address space (Fig. 2; [0027] FIG. 2 illustrates an embodiment of the local cache 200 i, such as one of the local caches 200 1, 200 2 . . . 200 n, for a CPU 114 i. A local cache 200 i may include one or more tasks 202 (i.e. a workload of the particular application) being executed by the CPU 114 i, a local queue 204 of cache segments 108 i (i.e. locations in a memory address space) obtained from the global queue 110 that are available to allocate for use by the tasks 202; [0028] FIG. 3 illustrates an embodiment of the global queue manager cache 300 that includes a global queue manager 302 to manage access to the global queue 110; global queue management information 500 having information on management of cache segments across all local queues 204 and accesses by all of the CPUs 114 i of the global queue 110 to allocate or return cache segments 108 i; a machine learning module 304 that receives as input 306 some or all of the global queue management information 500 for all the CPUs 114 i and computes an optimum number parameter vector 308 that includes an optimum number parameter 210 for every CPU 114 i and a transfer number parameter vector 310 that includes a transfer number parameter 212 for every CPU 114 i. An allocate/demote counter 312 that indicates, for every CPU 114 i, a number of allocate/demote operations with respect to the global queue 110; [0029]-[0030] The local cache managers 208 may then use the outputted optimum number parameter 210 and transfer number parameter 212 in the vectors 308 and 310, respectively, to determine when to request more cache segments 108 i from the global queue manager 302 or when to return/demote the transfer number parameter 212 of cache segments from the local queue 204 to the global queue 110);
rewriting the cache to promote and demote locations within the cache using the incremental changes to current cache-residency statuses of the locations which allow for the application to be executed at a stable level ([0029]-[0030] The local cache managers 208 may then use the outputted optimum number parameter 210 and transfer number parameter 212 in the vectors 308 and 310, respectively, to determine when to request more cache segments 108 i from the global queue manager 302 or when to return/demote the transfer number parameter 212 of cache segments from the local queue 204 to the global queue 110; [0041] current global queue management information 500 is used to determine the parameters the CPUs 114 i use to determine when to allocate more cache segments from the global queue 110 and to demote and return cache segments 108 i to the global queue 110. Each CPU 114 i is provided operational parameters based on that CPUs 114 i specific operations and performance and the operations of all the CPUs 114 i with respect to the global queue 110; [0017] if the local queue has a relatively low number of cache segments needed to allocate to I/O operations, then the processing unit must obtain a lock to a global queue from which it can allocate more cache segments to the local queue. Further, if the local queue has a number of cache segments exceeding an optimum number, then the processing unit must obtain a lock on the global queue to demote cache segments from the local queue to the global queue. Because multiple processing units may be accessing the global queue to obtain and return cache segments, other processing units will experience latency delays to obtain the lock, which will introduce latency for their task processing as they wait to obtain a lock for the global queue to allocate or demote cache segments (i.e not a stable level); [0018] Described embodiments control the number of lock requests to reduce latency in obtaining a lock to the global queue by adjusting the number of cache segments transferred between the local queue and the global queue. Increasing the number of cache segments to transfer reduces lock contention by reducing the frequency at which the processing units need to request the lock to access the global queue; [0019] In described embodiments, cache segment management information related to management of segments in the local queues and accesses to the global queue to transfer cache segments between the local queues and the global queue is provided to a machine learning module to output an optimum number parameter comprising an optimum number of segments to maintain in a local queue and a transfer number parameter comprising a number of cache segments to move between a local queue and the global queue. The optimum number parameters and the transfer number parameters are sent to the processing units to use to transfer the transfer number parameter of cache segments from the local queue to the global queue in response to determining that a number of segments in the local queue exceeds the optimum number parameter and to transfer the transfer number parameter of cache segments from the global queue to the local queue in response to determining that a number of segments in the local queue is less than the optimum number parameter).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of managing a global queue of cache segments for processing units by optimizing processing units operations with respect to their local queues in a manner that maintains a sufficient number of cache segments in the local queue to minimize or reduce the need for the processing unit to access the global queue to access or return resources by using a machine learning module as suggested in GUPTA into Gottin’s system because both of these systems are addressing training a machine learning module to manage cache policy. This modification would have been motivated by the desire to provide improved techniques to manage the provisioning of cache segments from a global queue to the local queues of processors to use for I/O operations (GUPTA, [0004]).

Regarding dependent claim 47, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 45 that is incorporated. Gottin further teaches wherein the stable level has been achieved when a standard deviation between a predefined number of past rewards is less than a predefined threshold [0070] in some embodiments a baseline regularized instantaneous reward r is used, which shows how much better the selected parameters performed relative to a baseline b, where the baseline b is a static value selected for the algorithm. In other embodiments, the instantaneous reward r is a function of both a baseline hit rate and an instantaneous cache hit rate).

Regarding dependent claim 48, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 45 that is incorporated. Gottin further teaches wherein the ML agent is a reinforcement learning (RL) agent in an RL environment ([0077] FIG. 6 is a functional block diagram of a reinforcement learning process 600 connected to an environment 610, according to some embodiments. As shown in FIG. 6, in some embodiments a software agent 600 receives the observed state 605 of an environment 610 and applies the observed state as input 615 to a deep neural network 620. The deep neural network provides, as output, cache policy parameters 625 of the cache policies 230, 240 under the control of the cache parameter adjustment module 250).

Regarding dependent claim 49, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 45 that is incorporated. Gottin further teaches wherein the application is a deep learning (DL) application for training or inferencing, and the locations in the memory address space represent virtual address ranges for the workload (Fig. 1; [0018] In some embodiments, data clients 110 execute in emulations 120 such as a virtual machine instantiated in the context of the storage system 100. In some embodiments, a hypervisor 122 abstracts the physical resources of the storage system 100 from emulations 120, and allocates physical resources of storage system 100 for use by the emulations 120. Each emulation 120 has an emulation operating system 124 and one or more application processes running in the context of the emulation operating system 124; [0027] One aspect of the reinforcement learning process enables the reinforcement learning process to account for the changes in the disk request patterns, e.g., due to non-stationary behavior. When a new application emerges, for instance, access patterns on the cache 118 change and the reinforcement learning process 220 must eventually determine that it is worth changing the parameterization of the caching policy to cope with the new patterns. As an illustrative example, such patterns may involve more sequential accesses to contiguous disk addresses than the previous observations, requiring larger prefetches).

Regarding dependent claim 50, the combination of Gottin and GUPTA teaches all the limitations as set forth in the rejection of claim 45 that is incorporated. Gottin further teaches wherein some of the locations that are associated with activation data of the DL application ([0030] In one embodiment, the machine learning modules 304 may comprise artificial neural network programs. Each neural network may be trained using backward propagation to adjust weights and biases (i.e. activation data) at nodes in a hidden layer to produce the computed optimum number parameter vector 308 and transfer number parameter vector 310. In backward propagation used to train a neural network machine learning module, margin of errors are determined based on operational parameters, such a margin of error of an adjusted transfer number parameter for each processing unit and a current transfer number parameter calculated for each processing unit to adjust weights and biases at nodes in a hidden layer of the machine learning module to produce the adjusted transfer number parameter).

Response to Arguments
Applicant's arguments filed 01/02/2025 have been fully considered. Each of applicant’s remarks is set forth, followed by examiner’s response.
(1) Regarding 35 U.S.C. 101 rejections, Applicant’s amendments to the claims have overcome the rejections. Rejections under 35 U.S.C 101 to claims 1-50 are withdrawn.

(2) Regarding to rejection of claims 1-50 under 35 U.S.C. § 103, Applicant alleges pending Claims 1, 24-25 and 45 recite that the claimed cache itself is rewritten to promote and demote locations therein using incremental changes to current cache residency statuses of locations (in the cache). Since the cited portion of Gupta discloses, as established above, moving a cache segment between one cache (Gupta's global queue 110) to another cache (Gupta's local caches 200), the cited portion of Gupta does not disclose rewriting a single cache to promote and demote locations therein (i.e., within the cache) as recited in pending Claims 1, 24-25, and 45. Further, since the cited portion of Gupta discloses, as established above, moving cache segments based on a lock contention of one of Gupta's caches (Gupta's global queue 110), the cited portion of Gupta does not disclose rewriting a cache using incremental changes to current cache residency statuses of location within a cache, as recited in pending Claims 1, 24-25, and 45. There is no discussion of current cache residency statuses in this cited portion of Gupta.
As to point (2), Examiner respectfully disagrees. Gupta illustrates in Fig. 1 that the processor complex 102 may include a plurality of processing cores 112 1 . . . 112 m, where each core 112 i, as shown with respect to core 112 1, includes a plurality of central processing units (CPUs) 114 1, 114 2 . . . 114 n, also referred to herein as processors or processing units. Each of the CPUs 114 1, 114 2 . . . 114 n include a local cache 200 1, 200 2 . . . 200 n, such as an L1 cache ([0022]). Local cache memory address space is allocated in cache segments 108 i of the cache 108.  From Fig. 2, a Least Recently Used (LRU) list 206 of cache segments allocated from the local queue 204 for use by the tasks 202; a local cache manager 208 to manage allocation of cache segments 108 i indicated in the local queue 204 to the LRU list 206 and to demote cache segments 108 i from the LRU list 206 to the local queue 204 ([0027]) which maintain current cache residency statuses. The local cache managers 208 may then use the outputted optimum number parameter 210 and transfer number parameter 212 to determine when to request more cache segments 108 i from the global queue manager 302 or when to return/demote the transfer number parameter 212 of cache segments from the local queue 204 to the global queue 110 ([0029]-[0030]). Thus, GUPTA is considered to teach rewriting a cache using incremental changes to current cache residency statuses of location within a cache, as recited in pending Claims 1, 24-25, and 45. Therefore, the combination of Gottin and GUPTA is considered to teach claims 1, 24, 25 and 45 and consequently, dependent claims 2-23, 26-44 and 47-50 are rejected.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
Zmora et al. (US 20210349835 A1) discloses detecting a cache line conflict in a last-level cache (LLC) communicatively coupled to the plurality of compute engines and implementing context-based eviction policy to determine a cache way in the cache to evict in order to resolve the cache line conflict.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY P HOANG whose telephone number is (469)295-9134. The examiner can normally be reached M-TH 8:30-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached at 571-272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AMY P HOANG/Examiner, Art Unit 2143                                                                                                                                                                                                        
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

Oct 29, 2021
Application Filed
Jan 14, 2022
Response after Non-Final Action
Mar 09, 2022
Response after Non-Final Action
Apr 04, 2025
Non-Final Rejection — §103
Jul 10, 2025
Response Filed
Sep 29, 2025
Final Rejection — §103
Jan 02, 2026
Request for Continued Examination
Jan 20, 2026
Response after Non-Final Action
Feb 10, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/455,325
Patent 12602596
APPARATUS AND METHOD FOR VALIDATING DATASET BASED ON FEATURE COVERAGE
2y 5m to grant Granted Apr 14, 2026
18/525,453
Patent 12572263
ACCESS CARD WITH CONFIGURABLE RULES
2y 5m to grant Granted Mar 10, 2026
17/572,921
Patent 12536432
PRE-TRAINING METHOD OF NEURAL NETWORK MODEL, ELECTRONIC DEVICE AND MEDIUM
2y 5m to grant Granted Jan 27, 2026
17/241,391
Patent 12475669
METHOD AND APPARATUS WITH NEURAL NETWORK OPERATION FOR DATA NORMALIZATION
2y 5m to grant Granted Nov 18, 2025
18/386,907
Patent 12461595
SYSTEM AND METHOD FOR EMBEDDED COGNITIVE STATE METRIC SYSTEM
2y 5m to grant Granted Nov 04, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
70%
Grant Probability
99%
With Interview (+64.2%)
3y 3m
Median Time to Grant
High
PTA Risk
Based on 232 resolved cases by this examiner. Grant probability derived from career allow rate.
TECHNIQUE FOR AUTONOMOUSLY MANAGING CACHE USING MACHINE LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email