Last updated: May 29, 2026
Application No. 17/032,372
CONFIGURING A POWER MANAGEMENT SYSTEM USING REINFORCEMENT LEARNING

Non-Final OA §103
Filed
Sep 25, 2020
Examiner
NGUYEN, HENRY K
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Ati Technologies Ulc
OA Round
5 (Non-Final)
Interview Optional

— +30.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 58% grant rate with +30.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 160 resolved cases, 2023–2026
Examiner Intelligence

NGUYEN, HENRY K View full profile →
Grants 58% of resolved cases
Career Allowance Rate
92 granted / 160 resolved
+2.5% vs TC avg
Strong +31% interview lift
Without
With
+30.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
13 currently pending
Career history
185
Total Applications
across all art units
Statute-Specific Performance

§101
3.9%
-36.1% vs TC avg
§103
92.9%
+52.9% vs TC avg
§102
1.7%
-38.3% vs TC avg
§112
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 160 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/30/2025 has been entered.
Response to Arguments
Applicant's arguments filed 9/30/2025 have been fully considered but they are not persuasive. 
Applicant argues: Regarding claims 1, 11, and 21, Applicant argues the cited references do not disclose an “adjusted performance score”.
Examiner response: Examiner respectfully disagrees. 
Zhang discloses (Zhang para [0068]-[0069] “In an embodiment, the value of the reward signal is dependent on key performance indicators such as packet loss, power (lower core frequency) and resource utilization (Last Level Cache occupancy). For example, the reward signal value can be computed using the three key performance indicators discussed above as shown below: R.sub.t+1=−Ø.sub.α*pkt.sub.loss.sub._.sub.rate−Ø.sub.1*care.sub.freq*−Ø.sub.z*cache_occupancy where Ø.sub.0, Ø.sub.1 and Ø.sub.2 are coefficients that can be fine-tuned.”) wherein the packet loss, power, and resource utilization are used to calculate the reward and can be fine-tuned (i.e., adjusted performance score). The packet loss, power, and resource utilization are performance scores. Zhang further teaches that the packet loss, power, and cache occupancy may be normalized in during preprocessing (Zhang para [0063] “For the platform hardware set 608, a subset of telemetry data that highly correlate to key performance indicator (KPI) packet loss are selected using the recursive feature elimination (RFE) algorithm from Scikit-Learn machine learning library. Every 10 seconds a core clock frequency between 1 GigaHertz (GHz) and 2.3 GHz is selected and platform telemetry data and packet loss information is sampled every second. The telemetry data includes status such as a count of cache_misses, instructions, and instruction cache load misses (cache_load_misses) and CPU cycle percentage status such as percentage of CPU idle cycles, percentage of CPU system cycles, and percentage of software interrupt requests (soft_irq). Data pre-processing (timestamp alignment, interpolation, missing data handling, normalization, etc.) is performed on the data set, for example, formed as a classification problem with each training sample with 34 features and target label being 0—no packet loss and 1—packet loss.”). Zhang teaches applying a reward function to adjust the normalized performance score but does not disclose that the reward function is nonlinear. However, Yao (US 20190220744 A1) teaches (Yao para [0072] “The reward function used to calculate the reward value may be linear or non-linear.”).
	
Applicant argues: Regarding claim 5 and 15, Applicant argues the performance counters, power consumption, and frequency modifications are not determined at intervals. 
Examiner response: Examiner respectfully disagrees. Gupta discloses outputting performance indicators, processing frequency, and power consumption used to determine the state (Gupta pg. 15, section 3.1; “State. We define the state as the core utilizations, sum of little core utilizations, big and little core frequencies, number of big and little cores, total power consumption, and five normalized performance counters listed below… Performance Counters. The behavior of the workload is captured through the following hardware performance counters: CPU Cycles, Branch Miss Prediction, L2$-misses, Data Memory Access, and Noncache External Memory Requests, all normalized by the number of Instructions Retired.”). The state is further determined at intervals (pg. 15 section 3.1; “At each control interval, the policy chooses the action that leads to the maximum Q-value. Finally, the weights of the DQN are updated using current state St, current action At, next state Stþ1, and the reward”). Arguments are not persuasive.
Applicant argues: Regarding claim 32, 34, and 36, Applicant argues the frequency modification is not selected from a probability distribution. 
Examiner response: Examiner respectfully disagrees. Gupta discloses determining actions such as modifying the frequencies of the core (Gupta pg. 15, section 3.1; “Actions. We control four knobs: the number of active little cores nL, operating frequency of active little cores fL, number of active big cores nB, and operating frequency of active big cores fB. Each action can take three values, which are encoded as 0 (decrease), 1 (no change), and 2 (increase).”). Gupta further discloses finding an optimal configuration for the actions (pg. 15 section 3.3; “For example, if configuration f1 L; 0:8 GHz; 2 B; 1 GHzg is seen more often than others, we balance the representation of both types of data samples.”). The optimal frequency is determined from a probability distribution (“Our technique achieves this by inserting new data points to the experience buffer by using the following priority function: 
    PNG
    media_image1.png
    57
    453
    media_image1.png
    Greyscale
 where nT is the total number of configurations in the optimal PPW set, and ni is the number of occurrences of configuration i. k is a gain parameter that changes the shape of the distribution and N is the total number of configurations supported by the platform. This function enables us to increase the priority of data points that have less occurrence in the dataset and vice versa. After computing the priorities, we find the probability of inserting configuration i to the experience replay buffer as: 
    PNG
    media_image2.png
    67
    460
    media_image2.png
    Greyscale
”)
Applicant argues: Regarding claim 8, Applicant argues the cited references do not disclose a normalized performance score computed as a metric derived from the workload.
Examiner response: Examiner respectfully disagrees. As explained above, Zhang discloses using the packet loss, power (lower core frequency) and resource utilization (Last Level Cache occupancy) to calculate the reward. Zhang further discloses these values may be normalized. Zhang further teaches “para [0019] Typically, a server can monitor performance metrics that include key performance indicators to understand the state of server. Performance metrics that can be monitored include Central Processing Unit (CPU) utilization, memory utilization and network throughput.” Arguments are not persuasive.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-6, 8, 10-11, 15-16, 20-21, 25-26, 28, and 30-36 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US-20190199602-A1) in view of Gupta et al. (“A Deep Q-Learning Approach for Dynamic Management of Heterogeneous Processors”) and Yao et al. (US-20190220744-A1).
Regarding Claim 1,
Zhang (US 20190199602 A1) teaches a method of configuring a power management system using reinforcement learning, the method comprising:
executing, by a first processor, one or more workloads (para [0060]);
generating, by a neural network, a plurality of processing frequency modification determinations during execution of the one or more workloads based on at least an output of one or more performance counters (para [0019], [0042] Key performance indicators (i.e., performance counters).) during execution of the one or more workloads (para [0060]-[0061] “In an embodiment of a system that includes vCMTS 514 as shown in FIG. 5, the reinforcement learning algorithm 504 can adaptively scale clock frequency for the processor core 102 in the CPU module 108 to maximize power saving opportunities during a 24 hour period based on the workload.”);
modifying, by a system management unit, a frequency of the first processor based on the plurality of processing frequency modification determinations generated by the neural network (para [0060]-[0063]);
training the neural network (para [0061] “The state of key performance indicators (KPIs) 606 is monitored and used to train the reinforcement learning algorithm 504 to adaptively optimize hardware resources (for example, clock frequency for a processor core 102 in the CPU module 108) based on the forecasted workload and current sampled telemetry data.”), the training including iteratively:
receiving a plurality of performance characteristics for a workload (para [0068]), wherein the plurality of performance characteristics includes the plurality of processing frequency modification determinations generated by the neural network (para [0060]-[0063]), the output of one or more performance counters during execution of the workload (para [0055] “An embodiment has been described for a key performance indicator for predicting packet loss rate, in other embodiments other types of virtual network functions (VNFs) and their key performance indicators (KPIs) can be monitored in the system shown in FIG. 3.” Key performance indicator (i.e., performance counter).), and a plurality of power consumptions during execution of the workload (para [0055] “Other types of virtual network functions (VNFs) can include average latency, maximum latency, jitter, power consumption based on monitoring hardware functions and can include connection failure percentage, or miss-classification error rate based on monitoring software applications executing in the system.”);
determining an adjusted performance score for the execution of the workload by applying a function (para [0068]-[0069] “In an embodiment, the value of the reward signal is dependent on key performance indicators such as packet loss, power (lower core frequency) and resource utilization (Last Level Cache occupancy). For example, the reward signal value can be computed using the three key performance indicators discussed above as shown below: R.sub.t+1=−Ø.sub.α*pkt.sub.loss.sub._.sub.rate−Ø.sub.1*care.sub.freq*−Ø.sub.z*cache_occupancy where Ø.sub.0, Ø.sub.1 and Ø.sub.2 are coefficients that can be fine-tuned.” packet loss, power, resource utilization (i.e., adjusted performance score).) to a normalized performance score (para [0063] “For the platform hardware set 608, a subset of telemetry data that highly correlate to key performance indicator (KPI) packet loss are selected using the recursive feature elimination (RFE) algorithm from Scikit-Learn machine learning library. Every 10 seconds a core clock frequency between 1 GigaHertz (GHz) and 2.3 GHz is selected and platform telemetry data and packet loss information is sampled every second. The telemetry data includes status such as a count of cache_misses, instructions, and instruction cache load misses (cache_load_misses) and CPU cycle percentage status such as percentage of CPU idle cycles, percentage of CPU system cycles, and percentage of software interrupt requests (soft_irq). Data pre-processing (timestamp alignment, interpolation, missing data handling, normalization, etc.) is performed on the data set, for example, formed as a classification problem with each training sample with 34 features and target label being 0—no packet loss and 1—packet loss.” Preprocessing normalization may be applied to the packet loss, cache occupancy, and clock frequency collected in the telemetry data.)
calculating, based on the adjusted performance score and the plurality of power consumptions, a reward value for the execution of the workload (para [0068]-[0069] “For example, the reward signal value can be computed using the three key performance indicators discussed above as shown below: R.sub.t+1=−Ø.sub.α*pkt.sub.loss.sub._.sub.rate−Ø.sub.1*care.sub.freq*−Ø.sub.z*cache_occupancy where Ø.sub.0, Ø.sub.1 and Ø.sub.2 are coefficients that can be fine-tuned.”); and 
detecting a satisfaction of a convergence condition indicating that the neural network is trained (para [0070] “The process shown in FIG. 6 is repeated during the learning phase until the reinforcement learning algorithm 504 converges to an optimal resource allocation.”).
While Zhang discloses an adjusted performance score, Zhang does not explicitly disclose applying a nonlinear function to a normalized performance score.
Zhang does not explicitly disclose
determining an adjusted performance score for the execution of the workload by applying a non-linear performance function to a performance score;
calculating, based on the adjusted performance score and the plurality of power consumptions, a reward value for the execution of the workload;
modifying one or more weights of the neural network based on the reward value; and 
However, Gupta teaches
determining an adjusted performance score for the execution of the workload by applying a non-linear performance function to a normalized performance score (pg. 15, section 2; “IL approximates the policy using linear function and nonlinear functions in the form of trees, whereas we employ a nonlinear function in the form of a neural network.”  pg. 15, section 3.1; “State: We define the state as the core utilizations, sum of little core utilizations, big and little core frequencies, number of big and little cores, total power consumption, and five normalized performance counters listed below… The policy is modeled with a DQN where the Qvalues are a function of state S, action A and neural network weights w, 
    PNG
    media_image3.png
    22
    117
    media_image3.png
    Greyscale
 Finally, the weights of the DQN are updated using current state St, current action At, next state Stþ1, and the reward Rtþ1:” performance counters (i.e., normalized performance score).);
modifying one or more weights of the neural network based on the reward value (pg. 15, section 3.1; “Finally, the weights of the DQN are updated using current state St, current action At, next state Stþ1, and the reward Rtþ1:” 
    PNG
    media_image4.png
    27
    519
    media_image4.png
    Greyscale
”); and 
Both Zhang and Gupta are analogous art since they both teach methods and techniques that use neural networks and reinforcement learning for power management systems. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the reinforcement learning method and techniques taught in Zhang and enhance them to incorporate the reinforcement learning method and techniques taught in Gupta as a way to define the state of a performance management system that is not application-dependent as well as a replay buffer used for training that uses a small memory footprint. The motivation to combine is taught in Gupta, because the computer system data/information that includes these performance counters captures the workload behavior in an application-independent fashion, as well as allowing the construction of a replay buffer used for training the reinforcement learning system that stores positive and negative samples obtained from a representative set of configurations that includes these performance counter data with minimal impact to accuracy, thus improving the robustness and memory efficiency of the system (Gupta p.14 col.2 2nd-3rd paragraphs, p.15 col.1 Section 3.1 Overview and Preliminaries 4th paragraph, p.16 col.1 1st paragraph, and p.16 col.2 Section 4.2 Overhead Analysis and Comparison to Q-Table). 
	Zhang discloses applying a reward function to the adjusted performance score.
	Yao (US 20190220744 A1) teaches
	determining an adjusted performance score for the execution of the workload by applying a non-linear performance function (para [0072] “The reward function used to calculate the reward value may be linear or non-linear.”)
	Zhang and Yao are analogous because they are directed to the field of reinforcement learning.
	It would have been obvious to one of ordinary skill in the art before the effective fling date to modify the reinforcement learning model of Zhang with the reward function of Yao.
	Doing so would allow for implementing a reward function based on the requirements of the neural network designer (Yao para [0072]).
Regarding Claim 5,
Zhang, Gupta, and Yao teach the method of claim 1. Gupta further teaches wherein each of output of the one or more performance counters (pg. 15, section 3.1; “Performance Counters. The behavior of the workload is captured through the following hardware performance counters: CPU Cycles, Branch Miss Prediction, L2$-misses, Data Memory Access, and Noncache External Memory Requests, all normalized by the number of Instructions Retired.”), each of the plurality of power consumptions, and each of the plurality of processing frequency modification determinations (pg. 15, section 3.1; “We define the state as the core utilizations, sum of little core utilizations, big and little core frequencies, number of big and little cores, total power consumption, and five normalized performance counters listed below.”) corresponds to an interval of a plurality of intervals of execution of the workload (pg. 15 section 3.1; “At each control interval, the policy chooses the action that leads to the maximum Q-value. Finally, the weights of the DQN are updated using current state St, current action At, next state Stþ1, and the reward”).
Both Zhang and Gupta are analogous art since they both teach methods and techniques that use neural networks and reinforcement learning for power management systems. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the reinforcement learning method and techniques taught in Zhang and enhance them to incorporate the reinforcement learning method and techniques taught in Gupta as a way to define the state of a performance management system that is not application-dependent as well as a replay buffer used for training that uses a small memory footprint. The motivation to combine is taught in Gupta, because the computer system data/information that includes these performance counters captures the workload behavior in an application-independent fashion, as well as allowing the construction of a replay buffer used for training the reinforcement learning system that stores positive and negative samples obtained from a representative set of configurations that includes these performance counter data with minimal impact to accuracy, thus improving the robustness and memory efficiency of the system (Gupta p.14 col.2 2nd-3rd paragraphs, p.15 col.1 Section 3.1 Overview and Preliminaries 4th paragraph, p.16 col.1 1st paragraph, and p.16 col.2 Section 4.2 Overhead Analysis and Comparison to Q-Table). 
Regarding Claim 6,
Zhang, Gupta, and Yao teach the method of claim 1. Zhang further teaches wherein the output of the one or more performance counters comprises one or more of: a percentage of time a component is processing, a data throughput counter, a cache miss counter, and/or a counter indicating that a particular calculation is performed (para [0057] “Telemetry data (for example, CPU metrics such as, cache misses; memory utilization; CPU cycles and network throughput), incoming traffic and vCMTS specific statistics such as packet scheduling loss are periodically sampled using Collectd 510 at a configurable interval (for example, every second).”).
Regarding Claim 8,
Zhang, Gupta, and Yao teach the method of claim 1. Zhang further teaches wherein the normalized performance score is a computed metric derived from execution of the workload (para [0019] “Typically, a server can monitor performance metrics that include key performance indicators to understand the state of server. Performance metrics that can be monitored include Central Processing Unit (CPU) utilization, memory utilization and network throughput.”). 
Regarding Claim 10,
Zhang, Gupta, and Yao teach the method of claim 1. Gupta further teaches wherein the neural network is configured to accept, as input, one or more normalized performance counters (pg. 15, section 3.1; “We define the state as the core utilizations, sum of little core utilizations, big and little core frequencies, number of big and little cores, total power consumption, and five normalized performance counters listed below.”).
Both Zhang and Gupta are analogous art since they both teach methods and techniques that use neural networks and reinforcement learning for power management systems. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the reinforcement learning method and techniques taught in Zhang and enhance them to incorporate the reinforcement learning method and techniques taught in Gupta as a way to define the state of a performance management system that is not application-dependent as well as a replay buffer used for training that uses a small memory footprint. The motivation to combine is taught in Gupta, because the computer system data/information that includes these performance counters captures the workload behavior in an application-independent fashion, as well as allowing the construction of a replay buffer used for training the reinforcement learning system that stores positive and negative samples obtained from a representative set of configurations that includes these performance counter data with minimal impact to accuracy, thus improving the robustness and memory efficiency of the system (Gupta p.14 col.2 2nd-3rd paragraphs, p.15 col.1 Section 3.1 Overview and Preliminaries 4th paragraph, p.16 col.1 1st paragraph, and p.16 col.2 Section 4.2 Overhead Analysis and Comparison to Q-Table). 
Regarding Claim 11,
Claim 11 is the apparatus corresponding to the method of claim 1. Claim 11 is substantially similar to claim 1 and is rejected on the same grounds.
Regarding Claim 15,
Claim 15 is the apparatus corresponding to the method of claim 5. Claim 15 is substantially similar to claim 5 and is rejected on the same grounds.
Regarding Claim 16,
Claim 16 is the apparatus corresponding to the method of claim 6. Claim 16 is substantially similar to claim 6 and is rejected on the same grounds.
Regarding Claim 20,
Claim 20 is the apparatus corresponding to the method of claim 10. Claim 20 is substantially similar to claim 10 and is rejected on the same grounds.
Regarding Claim 21,
Claim 21 is the computer program product corresponding to the method of claim 1. Claim 1 is substantially similar to claim 1 and is rejected on the same grounds.
Regarding Claim 25,
Claim 25 is the computer program product corresponding to the method of claim 5. Claim 25 is substantially similar to claim 5 and is rejected on the same grounds.
Regarding Claim 26,
Claim 26 is the computer program product corresponding to the method of claim 6. Claim 26 is substantially similar to claim 6 and is rejected on the same grounds.
Regarding Claim 28,
Claim 28 is the computer program product corresponding to the method of claim 8. Claim 28 is substantially similar to claim 8 and is rejected on the same grounds.
Regarding Claim 30,
Claim 30 is the computer program product corresponding to the method of claim 10. Claim 30 is substantially similar to claim 10 and is rejected on the same grounds.
Regarding Claim 31,
Zhang, Gupta, and Yao teach the method of claim 1. Zhang further teaches wherein the neural network is deployed in a test system (para [0046]-[0047]).
Regarding Claim 32,
Zhang, Gupta, and Yao teach the method of claim 1. Gupta further teaches wherein, during training of the neural network, a particular processing frequency modification determination (pg. 15, section 3.1; “Actions. We control four knobs: the number of active little cores nL, operating frequency of active little cores fL, number of active big cores nB, and operating frequency of active big cores fB. Each action can take three values, which are encoded as 0 (decrease), 1 (no change), and 2 (increase).”) is generated by the neural network by selecting a processing frequency modification from a probability distribution (pg. 16 section 3.3; “k is a gain parameter that changes the shape of the distribution and N is the total number of configurations supported by the platform. This function enables us to increase the priority of data points that have less occurrence in the dataset and vice versa. After computing the priorities, we find the probability of inserting configuration i to the experience replay buffer as: 
    PNG
    media_image5.png
    45
    379
    media_image5.png
    Greyscale
”).
Regarding Claim 33,
Claim 33 is the apparatus corresponding to the method of claim 31. Claim 33 is substantially similar to claim 31 and is rejected on the same grounds.
Regarding Claim 34,
Claim 34 is the apparatus corresponding to the method of claim 32. Claim 34 is substantially similar to claim 32 and is rejected on the same grounds.
Regarding Claim 35,
Claim 35 is the computer program product corresponding to the method of claim 31. Claim 35 is substantially similar to claim 31 and is rejected on the same grounds.
Regarding Claim 36,
Claim 36 is the computer program product corresponding to the method of claim 32. Claim 36 is substantially similar to claim 32 and is rejected on the same grounds.

Claims 3, 13 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Zhang/Gupta/Yao, as applied above, and further in view of Sutton et al. (“Reinforcement Learning: An Introduction, Chapter 2, MIT Press 1998” [hereafter referred as Sutton Chapter 2].).
Regarding Claim 3,
Zhang, Gupta, and Yao teach the method of claim 1. Zhang, Gupta, and Yao do not explicitly disclose
wherein the convergence condition comprises one or more of the reward value satisfying a threshold or a degree of variance across a plurality of reward values falling below a threshold.
Sutton Chapter 2 teaches
… wherein the convergence condition comprises one or more of the reward value satisfying a threshold or a degree of variance across a plurality of reward values falling below a threshold (Examiner’s note: This limitation broadly indicates that the convergence condition includes a condition that includes having a reward value satisfying a threshold value or having a reward value satisfying a degree of variance across a plurality of reward values not satisfying a threshold. Sutton Chapter 2 teaches a reinforcement comparison method that involves comparing an average of previously received rewards representing a reference reward against the current reward to determine a difference that is applied to the preference for selecting the action to be performed, where high rewards increase the probability of reselecting the action taken, and low rewards decrease the probability of reselecting the action. Sutton Chapter 2 additionally teaches that this method leads to quicker accumulation of rewards, resulting in a larger Q(a) that approaches closer to the mean accumulated reward received Q*(a) in a shorter amount of time, where Sutton Chapter 2 further indicates this method is a form of an ϵ-greedy algorithm that approaches the asymptotic guarantee that the probability of selecting the optimal action converges to greater than 1- ϵ to near certainty, where reaching this asymptotic level of 1- ϵ represents a convergence condition of the algorithm, thus corresponding to a convergence condition that includes a condition that includes having a reward value satisfying a threshold value or having a reward value satisfying a degree of variance across a plurality of reward values not satisfying a threshold (Sutton pp.27-28 Section 2.2 Action-Value Methods 1st-3rd paragraphs, and pp.41-43 Section 2.8 Reinforcement Comparison, including p.43 Figure 2.5).).  
Zhang, Gupta, Yao, and Sutton Chapter 2 are analogous art since they both teach reinforcement learning methods and techniques. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the reinforcement learning methods and techniques taught in Zhang and Gupta and enhance them to incorporate the reinforcement learning method and techniques taught in Sutton Chapter 2 as a way to improve the performance of the system. The motivation to combine is taught in Sutton Chapter 2, because reinforcement learning methods that apply reinforcement comparison methods can perform better than reinforcement learning methods that apply ϵ-greedy methods, thus improving the performance of the system (Sutton p.42 3rd paragraph and p.43 Figure 2.5). 
Regarding Claim 13,
Claim 13 is the apparatus corresponding to the method of claim 3. Claim 13 is substantially similar to claim 3 and is rejected on the same grounds.
Regarding Claim 23,
Claim 23 is the computer program product corresponding to the method of claim 3. Claim 23 is substantially similar to claim 3 and is rejected on the same grounds.

Claims 9, 19, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Zhang/Gupta/Yao, as applied above, and further in view of Rong et al. (US-20150124765-A1).
Regarding Claim 9,
Zhang, Gupta, and Yao teach the method of claim 1. 
	Zhang, Gupta, and Yao do not explicitly disclose
wherein the non-linear performance function is based on a ratio of the adjusted performance score to the normalized performance score and a performance loss threshold.
However, Rong (US 20150124765 A1) teaches
wherein the non-linear performance function is based on a ratio of the adjusted performance score to the normalized performance score and a performance loss threshold (para [0055]).
Zhang, Gupta and Rong are analogous because they are because they are directed to the same field of endeavor of device power management.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the power management system of Zhang and Gupta with the loss threshold of Rong.
Doing so would allow for adjusting the stream of power without the unnecessary loss of performance of coverage and capacity (Rong para [0016]).
Regarding Claim 19,
Claim 19 is the apparatus corresponding to the method of claim 9. Claim 19 is substantially similar to claim 9 and is rejected on the same grounds.
Regarding Claim 29,
Claim 29 is the computer program product corresponding to the method of claim 9. Claim 29 is substantially similar to claim 9 and is rejected on the same grounds.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Zhang/Gupta/Yao, as applied above, and further in view of Leung et al. (US-20210303045-A1).
Regarding Claim 18,
Zhang, Gupta, and Yao teach the apparatus of claim 11. 
	Zhang, Gupta, and Yao do not explicitly disclose
wherein the normalized performance score is a score for a benchmark or stress test performed by the workload.
However, Leung teaches
wherein the normalized performance score is a score for a benchmark or stress test performed by the workload (para [0149] “Although embodiments are not limited in this regard, as examples this power metric information may include performance scores of benchmark, frequency (P-state) residency, CPU on/off (C-state) residency, among other such information. Thereafter the workload in execution may be stopped (block 2140).”).
Zhang, Gupta, Yao, and Leung are analogous because they are directed towards optimizing power consumption for CPU processing using machine learning models.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning models of Zhang, Gupta, and Yao with the benchmark scores of Leung.
Doing so would allow for testing the different types of workloads to collect power metric information (Leung para [0149]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HENRY NGUYEN/Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Show 10 earlier events
Sep 09, 2024
Non-Final Rejection mailed — §103
Dec 03, 2024
Examiner Interview Summary
Dec 03, 2024
Applicant Interview (Telephonic)
Dec 27, 2024
Response Filed
Jul 01, 2025
Final Rejection mailed — §103
Sep 30, 2025
Request for Continued Examination
Oct 08, 2025
Response after Non-Final Action
Jan 14, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/561,896
Patent 12585933
TRANSFER LEARNING WITH AUGMENTED NEURAL NETWORKS
6y 6m to grant Granted Mar 24, 2026
19/115,468
Patent 12572776
Method, System, and Computer Program Product for Universal Depth Graph Neural Networks
11m to grant Granted Mar 10, 2026
15/225,806
Patent 12547484
Methods and Systems for Modifying Diagnostic Flowcharts Based on Flowchart Performances
9y 6m to grant Granted Feb 10, 2026
17/153,453
Patent 12541676
NEUROMETRIC AUTHENTICATION SYSTEM
5y 0m to grant Granted Feb 03, 2026
18/509,585
Patent 12505470
SYSTEMS, METHODS, AND STORAGE MEDIA FOR TRAINING A MACHINE LEARNING MODEL
2y 1m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
58%
Grant Probability
88%
With Interview (+30.7%)
4y 5m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 160 resolved cases by this examiner. Grant probability derived from career allowance rate.