Last updated: April 19, 2026
Application No. 17/629,133
DEVICE AND METHOD FOR DATA-BASED REINFORCEMENT LEARNING

Non-Final OA §101§103
Filed
Jan 21, 2022
Examiner
ZECHER, CORDELIA P K
Art Unit
2100
Tech Center
2100 — Computer Architecture & Software
Assignee
Agilesoda Inc.
OA Round
1 (Non-Final)
This examiner grants 50% of cases after interview

— +25.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 509 resolved cases, 2023–2026
Examiner Intelligence

ZECHER, CORDELIA P K View full profile →
Grants 50% of resolved cases
Career Allow Rate
253 granted / 509 resolved
-5.3% vs TC avg
Strong +26% interview lift
Without
With
+25.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
287 currently pending
Career history
796
Total Applications
across all art units
Statute-Specific Performance

§101
19.0%
-21.0% vs TC avg
§103
46.8%
+6.8% vs TC avg
§102
13.1%
-26.9% vs TC avg
§112
16.0%
-24.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 509 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the Application filed on 01/21/2022. Claims 1 and 7 are independent claims.
Claim Objections
Claims 1-12 are objected to because of the following informalities: please remove numbers and parenthesis such as, “(400, 400, 400), (520, 520a, 520b), and (300)”, in claim 1 and the following claims.  Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: in claims 1 and 7, “an agent configured to distinguish case 1,” “configured to determine an action,” “reward control unit configured to calculate a difference value.” In claims 2 and 8, “reinforcement learning metric configured as a rate of return.” In claims 3 and 9, “reinforcement learning metric configured as a limit exhaustion rate.” In claims 4 and 10, “reinforcement learning metric is configured as a loss rate.” In claims 5 and 11, “individual reinforcement learning metric is configured with a predetermined weight value.” In claims 6 and 12, “reinforcement learning metric is configured to determine a final reward.”
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification, Paragraph 0048, “As shown in FIG. 2, a data-based reinforcement learning device according to an embodiment of the disclosure includes an agent 100 and a reward control unit 300, and is configured to allow the agent 100 to learn a reinforcement learning model to maximize a reward for an action selectable according to a current state in a random environment 200, and to allow the reward control unit 300 to provide a difference between a total variation rate and an individual variation rate for each action as a reward for the agent 100. ”( as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Claims 1-6 are drawn to a device and claims 7-12 are drawn to a method, therefore each of these claim groups falls under one of four categories of statutory subject matter (machine/products/apparatus, process/method, manufactures and compositions of mater; Step 1).  Nonetheless, the claims are directed to a judicially recognized exception of an abstract idea without significant more.  Independent claims 1 and 7 are non-verbatim but similar in claim construction, hence share the same rationale that the claimed inventions are directed to non-statutory subject matter as follows:

Regarding Claim 1:
Subject Matter Eligibility Analysis Step 1:
Claim 1 recites “A data-based reinforcement learning device comprising:” and is a device, thus a machine, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1: 
“determine an action such that the reinforcement learning metric (520, 520a, 520b) is maximized with regard to individual piece of data corresponding to stay with regard to a current limit” (determining an action is a mental process, or judgement, that can be performed in the human mind, thus an abstract idea. See MPEP § 2106.04(a)(2)(III)).
“calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), …, and a total variation rate of a rate of return” (Calculating the difference between two values is an abstract idea of a mathematical relationship, as directed to “a mathematical relationship is a relationship between variables or numbers. A mathematical relationship may be expressed in words or using mathematical symbols”.  See MPEP § 2106.04(a)(2)(I)(A)).
The additional elements as disclosed above alone or in combination recite an abstract idea due to mental processes being able to be done in the human mind or a mathematical relationship. 
 Subject Matter Eligibility Analysis Step 2A Prong 2:
“and provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b)” (Providing a reward can be seen as outputting information, which is an additional element that amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP §§ 2106.04(d), 2106.05(g).))
“calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), …, and a total variation rate of a rate of return” (Calculating a difference in value can be seen as outputting information, which is an additional element that amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP §§ 2106.04(d), 2106.05(g).))
The additional elements as disclosed above alone or in combination do not integrate the abstract idea into a practical application as they are mere insignificant extra solution activity in combination of generic computer functions being implemented with generic computer elements performing the disclosed abstract idea above. 
Subject Matter Eligibility Analysis Step 2B: 
“and provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b)” (Providing a reward can be seen as outputting information, which is an additional element that amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP §§ 2106.04(d), 2106.05(g). Furthermore, the additional element is directed to receiving information, which the courts have recognized as well‐understood, routine, and conventional when they are claimed in a generic manner. See MPEP § 2106.05(d)(II).)
The additional elements as disclosed above alone or in combination do not integrate the judicial exception into practical application as they are mere instructions to apply the exception The limitations of updating neural network information and adjusting ratios are well-understood, routine, and conventional activity. Therefore, Claim 1 is subject matter ineligible.


Regarding Claim 2:
Subject Matter Eligibility Analysis Step 1:
Claim 1 recites “The data-based reinforcement learning device of claim 1” and is a device, thus a machine, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 1 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 2 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 

Regarding Claim 3:
Subject Matter Eligibility Analysis Step 1:
Claim 3 recites “The data-based reinforcement learning device of claim 2” and is a device, thus a machine, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 1 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 3 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 

Regarding Claim 4:
Subject Matter Eligibility Analysis Step 1:
Claim 4 recites “The data-based reinforcement learning device of claim 3” and is a device, thus a machine, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 1 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 4 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 

Regarding Claim 5:
Subject Matter Eligibility Analysis Step 1:
Claim 5 recites “The data-based reinforcement learning device of claim 4” and is a device, thus a machine, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 1 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 5 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 

Regarding Claim 6:
Subject Matter Eligibility Analysis Step 1:
Claim 6 recites “The data-based reinforcement learning device of claim 5” and is a device, thus a machine, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1: 
“wherein the reinforcement learning metric (520, 520a, 520b) is configured to determine a final reward by the calculation of the configured weight value of the individual reinforcement learning metric with a standardized variation value” (Determining an action is a mental process, or judgement, that can be performed in the human mind, thus an abstract idea. See MPEP § 2106.04(a)(2)(III)).
“wherein the final reward is determined based on the following formula (weight 1 * variation value of standardized rate of return) + (weight 2 * variation value of standardized limit exhaustion rate) - (weight 3 * variation value of standardized loss rate).” (Determining an action is a mental process, or judgement, that can be performed in the human mind, thus an abstract idea. See MPEP § 2106.04(a)(2)(III)).
The additional elements as disclosed above alone or in combination recite an abstract idea due to mental processes being able to be done in the human mind or a mathematical relationship. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B:  
The additional elements as disclosed above alone or in combination do not integrate the abstract idea into a practical application nor any claims recite additional elements that amount to significantly more than the judicial exception, as the claim is ineligible.   

Regarding Claim 7:
Subject Matter Eligibility Analysis Step 2A Prong 1: 
“determine an action such that the reinforcement learning metric (520, 520a, 520b) is maximized with regard to individual piece of data corresponding to stay with regard to a current limit” (Determining an action is a mental process, or judgement, that can be performed in the human mind, thus an abstract idea. See MPEP § 2106.04(a)(2)(III)).
“calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), …, and a total variation rate of a rate of return” (Calculating the difference between two values is an abstract idea of a mathematical relationship, as directed to “a mathematical relationship is a relationship between variables or numbers. A mathematical relationship may be expressed in words or using mathematical symbols”.  See MPEP § 2106.04(a)(2)(I)(A)).
The additional elements as disclosed above alone or in combination recite an abstract idea due to mental processes being able to be done in the human mind or a mathematical relationship. 
 Subject Matter Eligibility Analysis Step 2A Prong 2:
“to provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b)” (Providing a reward can be seen as outputting information, which is an additional element that amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP §§ 2106.04(d), 2106.05(g).))
“calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), …, and a total variation rate of a rate of return” (Calculating a difference in value can be seen as outputting information, which is an additional element that amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP §§ 2106.04(d), 2106.05(g).))
The additional elements as disclosed above alone or in combination do not integrate the abstract idea into a practical application as they are mere insignificant extra solution activity in combination of generic computer functions being implemented with generic computer elements performing the disclosed abstract idea above. 
Subject Matter Eligibility Analysis Step 2B: 
“and provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b)” (Providing a reward can be seen as outputting information, which is an additional element that amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP §§ 2106.04(d), 2106.05(g). Furthermore, the additional element is directed to receiving information, which the courts have recognized as well‐understood, routine, and conventional when they are claimed in a generic manner. See MPEP § 2106.05(d)(II).)
The additional elements as disclosed above alone or in combination do not integrate the judicial exception into practical application as they are mere instructions to apply the exception The limitations of updating neural network information and adjusting ratios are well-understood, routine, and conventional activity. Therefore, Claim 7 is subject matter ineligible.

Regarding Claim 8:
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 7 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 8 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 

Regarding Claim 9:
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 7 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 9 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 

Regarding Claim 10:
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 7 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 10 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 

Regarding Claim 11:
Subject Matter Eligibility Analysis Step 2A Prong 1: 
The claim does not recite any specific limitations that can be identified as an abstract idea. Therefore, the claim is not directed to a judicial exception and qualifies as eligible. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B: 
The claim incorporates the rejection of independent claim 7 and all elements are part of the abstract idea as shown above. The additional limitation recited in dependent claim 11 does not integrate the judicial exception into a practical application and no additional element recognized as well-understood, routine, and conventional, therefore it is ineligible under the step as per analysis for the parent claim. 


Regarding Claim 12:
Subject Matter Eligibility Analysis Step 2A Prong 1: 
“wherein the reinforcement learning metric (520, 520a, 520b) is configured to determine a final reward by the calculation of the configured weight value of the individual reinforcement learning metric with a standardized variation value” (Determining an action is a mental process, or judgement, that can be performed in the human mind, thus an abstract idea. See MPEP § 2106.04(a)(2)(III)).
“the final reward is determined based on the following formula (weight 1 * variation value of standardized rate of return) + (weight 2 * variation value of standardized limit exhaustion rate) - (weight 3 * variation value of standardized loss rate).” (Determining an action is a mental process, or judgement, that can be performed in the human mind, thus an abstract idea. See MPEP § 2106.04(a)(2)(III)).
The additional elements as disclosed above alone or in combination recite an abstract idea due to mental processes being able to be done in the human mind or a mathematical relationship. 
Subject Matter Eligibility Analysis Step 2A Prong 2 and Step 2B:  
The additional elements as disclosed above alone or in combination do not integrate the abstract idea into a practical application nor any claims recite additional elements that amount to significantly more than the judicial exception, as the claim is ineligible.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1,2, 7, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Buscema (US 20060230006 A1) in view of Lee (US 20130144937 A1).
Regarding claim 1, Buscema teaches A data-based reinforcement learning device comprising: an agent (100) configured to distinguish case 1 (400, 400, 400) in which a reinforcement learning metric (520, 520a, 520b) is higher than an overall average, case 2 (400a, 400a, 400a) in which the reinforcement learning metric (520, 520a, 520b) has no variation compared with the overall average, and case 3 (400b, 400b, 400b) in which the reinforcement learning metric (520, 520a, 520b) is lower than the overall average, (Paragraph 0104 of Buscema, "The health of "father “and "mother" individuals are greater than the average health of the entire population. In this case, the crossover is a classical crossover as shown in FIG. 5." Paragraph 0105, "The health of the "father" and "mother" individuals are lower than the average health of the entire population. In this case the offspring are formed through rejection of the parents genes that they would receive by the crossover process." Paragraph 0106, "3. The health of one of the parents is less than the average health of the entire population while the health of the other parent is greater than the average health of the entire population." Buscema teaches that there are three cases where the individuals' health is the learning metric. The first case is when the individuals have greater average health compared to the population. The second case is when one individual is above average health and one individual is below average health, thus no change compared to the overall average of the population. The third case is when both individuals are below average health compared to the population.) and configured to determine an action such that the reinforcement learning metric (520, 520a, 520b) is maximized with regard to individual piece of data corresponding to stay with regard to a current limit, up by a predetermined value compared with the current limit, and down by a predetermined value compared with the current limit, in each case; (Paragraph 0043, “As is usual for a genetic algorithm, as a first step, GenD calculates the fitness score of each individual of a population, depending on the function that requires optimization. … for example, an average health score of the entire population can be computed.” Paragraph 0067, "Another criterion of selection can be the so called R2 index, i.e. the linear correlation index of Pearson considering only the data of the dataset which variables has a R2 index value greater than a predetermined threshold value." Paragraph 0098, "An average health of the population is defined as a function, taking into account the fitness scores of all the prediction algorithms forming the individuals of the parent population." Buscema teaches that a genetic algorithm is used to calculate and optimize the fitness score of individuals and the average health score of the population. The individuals' overall health is being maximized and compared to the fitness scores of the average health of the population. When evaluating fitness scores produced from the prediction algorithms, some values may be up from a predetermined threshold value, or down from a predetermined threshold value). However, Buscema does not teach and a reward control unit (300) configured to calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), calculated for the action of individual piece of data determined by the agent (100), and a total variation rate of the reinforcement learning metric (520, 520a, 520b), and provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b), wherein the calculated difference value is converted into a standardized value between "0" and "1" and provided as a reward.
Lee though, teaches and a reward control unit (300) configured to calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), calculated for the action of individual piece of data determined by the agent (100), and a total variation rate of the reinforcement learning metric (520, 520a, 520b), and provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b), wherein the calculated difference value is converted into a standardized value between "0" and "1" and provided as a reward (Paragraph 0063 of Lee, "For example, if a user who used to be in a neutral emotion has often experience the feeling of irritation or anger recently, the emotion analysis unit 202 calculates the total emotion rate as 50%:50% (positive versus negative) and the recent emotion rate as 30%:70% (positive versus negative)." Paragraph 0064, "After the total emotion rate and the recent emotion rate are calculated, the change-in-emotion rate calculator 203 (see FIG. 2) calculates a change in emotion rate based on a difference between the recent emotion rate and the total emotion rate. For example, the change-in-emotion rate calculator 203 calculates -20 (from 30%-50%) as a change in emotion rate regarding the positive emotional state and +20 (from 70%-50%) as a change in emotion rate regarding the negative emotional state. The changes in emotion rates may indicate that the corresponding user's emotional state has changed from a positive or neutral mood to a negative mood." Lee teaches that the emotional analysis unit, which is comparable to the reward control unit, will calculate the difference between an individual emotion rate, or recent emotion rate, and the total emotion rate. The present application defines the "reward" as the difference between the total rate and individual rate. Thus, Lee teaches that the difference between recent emotion rate and total emotion rate is the reward. Once the difference is calculated, the change in emotion is calculated and can be converted into a percentage. For example, -20 can represent a -20% change, or -0.2, which is a standardized value between 0 and 1).
Buscema and Lee are analogous to the claimed invention because they are all in the same field of determining values of individuals’ health and emotional state to generate scores. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Buscema to incorporate the teachings of Lee. This is because Lee teaches a more accurate way to compare the individuals’ state to the overall state. Lee teaches a method of calculating the difference in rate of change between the individual and the total, while Buscema only teaches that an individual is above or below overall average. 
Regarding claim 2, the combination of Buscema and Lee teaches the subject matter of claim 1. Lee further teaches The data-based reinforcement learning device of claim 1, wherein the reinforcement learning metric (520) is configured as a rate of return (Paragraph 0064, "After the total emotion rate and the recent emotion rate are calculated, the change-in-emotion rate calculator 203 (see FIG. 2) calculates a change in emotion rate based on a difference between the recent emotion rate and the total emotion rate. For example, the change-in-emotion rate calculator 203 calculates -20 (from 30%-50%) as a change in emotion rate regarding the positive emotional state and +20 (from 70%-50%) as a change in emotion rate regarding the negative emotional state. The changes in emotion rates may indicate that the corresponding user's emotional state has changed from a positive or neutral mood to a negative mood." Lee teaches that the learning metric has a rate of return, which can be calculated from subtracting the individual rate from the total rate, then multiplied by 0.01. This will give you a percentage, or a rate of return.)
Regarding claim 7, Buscema teaches A data-based reinforcement learning method comprising: a) allowing an agent (100) to distinguish case 1 (400, 400, 400) in which a reinforcement learning metric (520, 520a, 520b) is higher than an overall average, case 2 (400a, 400a, 400a) in which the reinforcement learning metric (520, 520a, 520b) has no variation compared with the overall average, and case 3 (400b, 400b, 400b) in which the reinforcement learning metric (520, 520a, 520b) is lower than the overall average, (Paragraph 0104, "The health of "father“ and "mother" individuals are greater than the average health of the entire population. In this case, the crossover is a classical crossover as shown in FIG. 5." Paragraph 0105, "The health of the "father" and "mother" individuals are lower than the average health of the entire population. In this case the offspring are formed through rejection of the parents genes that they would receive by the crossover process." Paragraph 0106, "3. The health of one of the parents is less than the average health of the entire population while the health of the other parent is greater than the average health of the entire population." Buscema teaches that there are three cases where the individuals' health is the learning metric. The first case is when the individuals have greater average health compared to the population. The second case is when one individual is above average health and one individual is below average health, thus no change compared to the population. The third case is when both individuals are below average health compared to the population.) and to determine an action such that the reinforcement learning metric (520, 520a, 520b) is maximized with regard to individual piece of data corresponding to stay with regard to a current limit, up by a predetermined value compared with the current limit, and down by a predetermined value compared with the current limit, in each case; (Paragraph 0067, "Another criterion of selection can be the so called R2 index, i.e. the linear correlation index of Pearson considering only the data of the dataset which variables has a R2 index value greater than a predetermined threshold value." Paragraph 0098, "An average health of the population is defined as a function, taking into account the fitness scores of all the prediction algorithms forming the individuals of the parent population." Buscema teaches that the individuals' overall health is being maximized and compared to the fitness scores of the average health of the population. When evaluating fitness scores produced from the prediction algorithms, some values may be up from a predetermined threshold value, or down from a predetermined threshold value). However, Buscema does not teach b) allowing a reward control unit (300) to calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), calculated for the action of the individual piece of data determined by the agent (100), and a total variation rate of a rate of return; and c) allowing the reward control unit (300) to provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b), wherein the calculated difference value is converted into a standardized value between "0" and "1" and provided as a reward.
Lee though teaches b) allowing a reward control unit (300) to calculate a difference value between an individual variation rate of the reinforcement learning metric (520, 520a, 520b), calculated for the action of the individual piece of data determined by the agent (100), and a total variation rate of a rate of return; and c) allowing the reward control unit (300) to provide, as a reward for each action of the agent (100), the calculated difference value between the individual variation rate of the reinforcement learning metric (520, 520a, 520b) and the total variation rate of the reinforcement learning metric (520, 520a, 520b), wherein the calculated difference value is converted into a standardized value between "0" and "1" and provided as a reward (Paragraph 0063 of Lee, "For example, if a user who used to be in a neutral emotion has often experience the feeling of irritation or anger recently, the emotion analysis unit 202 calculates the total emotion rate as 50%:50% (positive versus negative) and the recent emotion rate as 30%:70% (positive versus negative)." Paragraph 0064, "After the total emotion rate and the recent emotion rate are calculated, the change-in-emotion rate calculator 203 (see FIG. 2) calculates a change in emotion rate based on a difference between the recent emotion rate and the total emotion rate. For example, the change-in-emotion rate calculator 203 calculates -20 (from 30%-50%) as a change in emotion rate regarding the positive emotional state and +20 (from 70%-50%) as a change in emotion rate regarding the negative emotional state. The changes in emotion rates may indicate that the corresponding user's emotional state has changed from a positive or neutral mood to a negative mood." Lee teaches that the emotional analysis unit, which is comparable to the reward control unit, will calculate the difference between an individual emotion rate, or recent emotion rate, and the total emotion rate. The present application defines the "reward" as the difference between the total rate and individual rate. Thus, Lee teaches that the difference between recent emotion rate and total emotion rate is the reward. Once the difference is calculated, the change in emotion is calculated and can be converted into a percentage. For example, -20 can represent a -20% change, or -0.2, which is a standardized value between 0 and 1).
Regarding claim 8, Lee further teaches the data-based reinforcement learning method of claim 7, wherein the reinforcement learning metric (520) is configured as a rate of return (Paragraph 0064, "After the total emotion rate and the recent emotion rate are calculated, the change-in-emotion rate calculator 203 (see FIG. 2) calculates a change in emotion rate based on a difference between the recent emotion rate and the total emotion rate. For example, the change-in-emotion rate calculator 203 calculates -20 (from 30%-50%) as a change in emotion rate regarding the positive emotional state and +20 (from 70%-50%) as a change in emotion rate regarding the negative emotional state. The changes in emotion rates may indicate that the corresponding user's emotional state has changed from a positive or neutral mood to a negative mood." Lee teaches that the learning metric has a rate of return, which can be calculated from subtracting the individual rate from the total rate, then multiplied by 0.01. This will give you a percentage, or a rate of return.)
Claim(s) 3 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Buscema (US 20060230006 A1) in view of Lee (US 20130144937 A1) and Goldberg (US 10102056 B1).
Regarding claim 3, the combination of Buscema and Lee teach the subject matter of claim 2. However, the combination does not teach the data-based reinforcement learning device of claim 2, wherein the reinforcement learning metric (520a) is configured as a limit exhaustion rate.
Goldberg though, teaches the data-based reinforcement learning device of claim 2, wherein the reinforcement learning metric (520a) is configured as a limit exhaustion rate (Col. 11, lines 37-46, "At 412, the baseline data module 206 may analyze the first baseline data to determine baseline metrics using the custom parameters. For example, the baseline data module may generate baseline values, bounds, thresholds, and/or rates of change. In some embodiments, the baseline data module 206 may determine peak values for intervals of data in the first baseline data, such as a peak rate of change, which may be used as an upper limit, a lower limit, and so forth. In other embodiments, an average value may be determined, which may be modified by a safety factor (e.g., a multiplier)." Limit exhaustion rate can be defined as the speed it takes for something to reach the upper limit and performance stagnates. Goldberg teaches that his application will analyze data and determine its parameters. The application can determine threshold values, rates of changes, and upper and lower bounds. If the application has a high rate of change, it can be used for the upper bound, thus the rate of change it takes to reach the upper bound is considered the limit exhaustion rate.)
Buscema, Lee, and Goldberg are analogous to the claimed invention because they are all in the same field of determining values of a system or individual to generate metrics. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Buscema and Lee to incorporate the teachings of Goldberg. This is because Goldberg teaches a more accurate way to track a system’s changes. Goldberg incorporates tracking the upper and lower bounds, along with the rate of change. This combination leads to being able to track the limit exhaustion rate, another metric that lead to a more accurate comparison of metrics. 
Regarding claim 9, the combination of Buscema and Lee teach the subject matter of claim 8. However, the combination does not teach the data-based reinforcement learning method of claim 8, wherein the reinforcement learning metric (520a) is configured as a limit exhaustion rate.
Goldberg though, teaches the data-based reinforcement learning method of claim 8, wherein the reinforcement learning metric (520a) is configured as a limit exhaustion rate (Col. 11, lines 37-46, "At 412, the baseline data module 206 may analyze the first baseline data to determine baseline metrics using the custom parameters. For example, the baseline data module may generate baseline values, bounds, thresholds, and/or rates of change. In some embodiments, the baseline data module 206 may determine peak values for intervals of data in the first baseline data, such as a peak rate of change, which may be used as an upper limit, a lower limit, and so forth. In other embodiments, an average value may be determined, which may be modified by a safety factor (e.g., a multiplier)." Limit exhaustion rate can be defined as the speed it takes for something to reach the upper limit and performance stagnates. Goldberg teaches that his application will analyze data and determine its parameters. The application can determine threshold values, rates of changes, and upper and lower bounds. If the application has a high rate of change, it can be used for the upper bound, thus the rate of change it takes to reach the upper bound is considered the limit exhaustion rate.)
Claim(s) 4, 5, 10, and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Buscema (US 20060230006 A1) in view of Lee (US 20130144937 A1), Goldberg (US 10102056 B1), and Cormier (US 20180082208 A1).
Regarding claim 4, the combination of Buscema, Lee, and Goldberg teach the subject matter of claim 3. However, the combination does not teach the data-based reinforcement learning device of claim 3, wherein the reinforcement learning metric (520b) is configured as a loss rate.
Cormier though, teaches the data-based reinforcement learning device of claim 3, wherein the reinforcement learning metric (520b) is configured as a loss rate (Paragraph 0288, "Volatility, as the name implies, is a time-varying property of the data that reflects the rapidity of change in the fundamental statistical properties of the data. In terms of trustworthiness, the System measures both the rate of change (the first derivative over time) and the rate of the rate of change (the second derivative over time). The higher the volatility of the data the higher the potential noise (as measured by a standard error of estimate from one time period to the next)." Cormier teaches that along with the rate of change being tracked, the standard error of estimates is also tracked. The higher the volatility, the higher the standard of error of estimates will be, thus more errors. The loss rate is equivalent to the standard of error of estimates because as the number of errors increases, both the loss rate and standard of errors will increase as well.)
Buscema, Lee, Goldberg, and Cormier are analogous to the claimed invention because they are all in the same field of determining values of a system or individual to generate metrics. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Buscema, Lee, and Goldberg to incorporate the teachings of Cormier. This is because Cormier teaches a more accurate way to track a system’s changes. Cormier incorporates tracking the rate of change and the volatility of the data. This volatility of the data can be measured by the standard error of estimate, which tracks how common errors are and is equivalent to the loss rate. This additional metric allows a more accurate way to compare data.
Regarding claim 5, the combination of Buscema, Lee, Goldberg, and Cormier teach the subject matter of claim 4. Lee further teaches the data-based reinforcement learning device of claim 4, wherein the reinforcement learning metric (520, 520a, 520b) is obtained such that the individual reinforcement learning metric is configured with a predetermined weight value or different weight values (Paragraph 0066, "According to another aspect, the change-in-emotion rate calculator 203 may allocate weights to the changes in emotion rate in consideration of the corresponding user's tendencies. For example, if the user has an outgoing or optimistic personality, the change-in-emotion rate calculator 203 may allocate a weight greater than 1 to the change in emotion rate regarding the positive emotional state and a weight smaller than 1 to the change in emotion rate regarding the negative emotional state. On the contrary, if the user has an introspective or pessimistic personality, the change-in-emotion rate calculator 203 may allocate a weight smaller than 1 to the change in emotion rate regarding the positive emotional state and a weight greater than 1 to the change in emotion rate regarding the negative emotional state. The change-in-emotion rate calculator 203 may then multiply the change in emotion rate regarding the positive emotional state and the change in emotion rate regarding the negative emotional state by their respective weights." Lee teaches that the learning metric can be configured with different weights that correspond to the individuals' tendencies. These weights consider whether the individual is optimistic or pessimistic.)
Regarding claim 10, the combination of Buscema, Lee, and Goldberg teach the subject matter of claim 9. However, the combination does not teach the data-based reinforcement learning method of claim 3, wherein the reinforcement learning metric (520b) is configured as a loss rate.
Courmier though, teaches the data-based reinforcement learning method of claim 3, wherein the reinforcement learning metric (520b) is configured as a loss rate (Paragraph 0288, "Volatility, as the name implies, is a time-varying property of the data that reflects the rapidity of change in the fundamental statistical properties of the data. In terms of trustworthiness, the System measures both the rate of change (the first derivative over time) and the rate of the rate of change (the second derivative over time). The higher the volatility of the data the higher the potential noise (as measured by a standard error of estimate from one time period to the next)." Cormier teaches that along with the rate of change being tracked, the standard error of estimates is also tracked. The higher the volatility, the higher the standard of error of estimates will be, thus more errors. The loss rate is equivalent to the standard of error of estimates because as the number of errors increases, both the loss rate and standard of errors will increase as well.)
Regarding claim 11, the combination of Buscema, Lee, Goldberg, and Cormier teach the subject matter of claim 10. Lee 
Read full office action
Prosecution Timeline

Jan 21, 2022
Application Filed
Sep 16, 2025
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/121,725
Patent 12583466
VEHICLE CONTROL MODULES INCLUDING CONTAINERIZED ORCHESTRATION AND RESOURCE MANAGEMENT FOR MIXED CRITICALITY SYSTEMS
2y 5m to grant Granted Mar 24, 2026
18/448,891
Patent 12578751
DATA PROCESSING CIRCUITRY AND METHOD, AND SEMICONDUCTOR MEMORY
2y 5m to grant Granted Mar 17, 2026
18/062,207
Patent 12561162
AUTOMATED INFORMATION TECHNOLOGY INFRASTRUCTURE MANAGEMENT
2y 5m to grant Granted Feb 24, 2026
18/364,680
Patent 12536291
PLATFORM BOOT PATH FAULT DETECTION ISOLATION AND REMEDIATION PROTOCOL
2y 5m to grant Granted Jan 27, 2026
18/411,841
Patent 12393641
METHODS FOR UTILIZING SOLVER HARDWARE FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS
2y 5m to grant Granted Aug 19, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
50%
Grant Probability
76%
With Interview (+25.8%)
3y 8m
Median Time to Grant
Low
PTA Risk
Based on 509 resolved cases by this examiner. Grant probability derived from career allow rate.