Last updated: April 19, 2026

Application No. 17/535,474

METHOD OF DETERMINING CONTINUOUS DRUG DOSE USING REINFORCEMENT LEARNING AND PHARMACOKINETIC-PHARMACODYNAMIC MODELS

Non-Final OA §102§112§Other

Filed

Nov 24, 2021

Examiner

ZEMAN, MARY K

Art Unit

1686

Tech Center

1600 — Biotechnology & Organic Chemistry

Assignee

Postech Research And Business Development Foundation

OA Round

3 (Non-Final)

This examiner grants 59% of cases after interview

— +33.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 532 resolved cases, 2023–2026

Examiner Intelligence

ZEMAN, MARY K View full profile →

Grants 59% of resolved cases

Career Allow Rate

315 granted / 532 resolved

-0.8% vs TC avg

Strong +34% interview lift

Without

With

+33.9%

Interview Lift

resolved cases with interview

Typical timeline

4y 1m

Avg Prosecution

28 currently pending

Career history

560

Total Applications

across all art units

Statute-Specific Performance

§101

33.7%

-6.3% vs TC avg

§103

12.4%

-27.6% vs TC avg

§102

18.8%

-21.2% vs TC avg

§112

23.4%

-16.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 532 resolved cases

Office Action

§102 §112 §Other

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/31/2025 has been entered.
	Claims 1-20 are under examination.
	The substitute drawings filed 12/31/2025 have been entered.
	The amendments to specification filed 12/31/2025 have been entered.
	The objections to the claims have been withdrawn.
	The rejection of claims 1-20 under 35 USC 101 has been withdrawn in view of amendments made to the claims, providing a practical application of a judicial exception under step 2A-2 where the continuous dosage of insulin is calculated and administered to the diabetic patient to achieve a specific patient state.
	The rejection of claims 1-20 under 35 USC 112(a) or (b) have been withdrawn in view of Applicant’s amendments and arguments.
	The rejection of claims 1-20 under 35 USC 102(a)(1) has been withdrawn in view of Applicant’s amendments and arguments; however new grounds of rejection are set forth below.
Claim Objections
Claims 1, 7 and 14 are objected to because of the following informalities:
In the limitation that begins: “while infusing insulin to the diabetic patient by an insulin infusion pump, in real time, automatically determining a continuous insulin infusion rate by the reinforcement learning algorithm, including:” it appears this limitation should read: “while infusing insulin to the diabetic patient by an insulin infusion pump, in real time, automatically determining a continuous insulin infusion rate by the trained reinforcement learning algorithm, including:” to maintain antecedent basis. Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. This is a NEW MATTER rejection.
Claims 1, 7 and 14 have been amended to recite “in real time” with respect to controlling the infusion to the patient. The specification does not clearly support this limitation, and applicant has not pointed out specific basis for this term in the original disclosure. 
It is suggested the limitation be amended to recite: “while infusing insulin to the diabetic patient by an insulin infusion pump, trained reinforcement learning algorithm, including:”
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Lee et al (2020).
Applicant’s effective filing date is that of the instant application, 11/24/2021. 
Applicant cannot rely upon the certified copy of the foreign priority application to overcome this rejection because a translation of said application has not been made of record in accordance with 37 CFR 1.55. When an English language translation of a non-English language foreign application is required, the translation must be that of the certified copy (of the foreign application as filed) submitted together with a statement that the translation of the certified copy is accurate. See MPEP §§ 215 and 216.
	Lee et al. Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Validation. IEEE J of Biomed and Health Informatics, vol 25, no 2, p536-546. Earliest date of publication: June 12, 2020. (Previously cited on PTO-892.) This NPL consists of a differing authorship, than the inventorship of this application.

	Lee is directed to utilizing RL to direct infusion doses of insulin for a diabetic patient: “A bioinspired RL designing method was developed for automated insulin infusion. This method has reward functions that imply the temporal homeostatic objective and discount factors that reflect an individual specific pharmacological characteristic. The proposed method was applied to a training method using an RL algorithm and was evaluated in virtual patients from the FDA approved UVA/Padova simulator with unannounced meal intakes.”
	With respect to the independent claims 1, 7 and 14 (method, system and computer program product):
Lee provides computer-implemented methods of calculating and administering insulin dosages by infusion, as well as the system, and computer program product, throughout, and particularly at page 537: “Environment: Virtual T1D patients”.
	“We developed the AP algorithm in OpenAI Gym [19], which is a Python toolkit for RL research, and allowed it to interact with the 10 adult and 10 adolescent in silico T1D patients from the 2013 version of the FDA-approved UVA/Padova simulator [20] as an MDP environment. For real-time interaction, dynamic equations and parameters of the UVA/Padova simulator were imported into the OpenAI Gym framework with the modification of Simglucose [21], which is the Python implementation of the 2008 version of the UVA/Padova simulator [22]. This Python-based simulating environment is subsequently referred to as “virtual patients.””
With respect to claim 1 and “A method for administrating a continuous drug dose determined using reinforcement learning and a pharmacokinetic-pharmacodynamic model, comprising:
determining a diabetic patient's pharmacokinetic-pharmacodynamic (PK-PD) model;”
Lee provides patient PK-PD models, from both virtual diabetic patient data, (p537) and using real-life patient data from the data of a particular diabetic patient, (p542). 

With respect to claim 1 and “training a reinforcement learning algorithm for the diabetic patient using insulin infusion data and patient state data based on the pharmacokinetic-pharmacodynamic model, including:
obtaining the diabetic patient's treatment record including the insulin infusion data and the patient state data associated with a plurality of infusion events, wherein each infusion event corresponds to a respective one of a plurality of predefined insulin doses, a prior patient state, and a post patient state;”
	Lee obtains the patient’s data, including infusion data and patient state data at pages 537 and 542.
With respect to claim 1 and “obtaining one or more reward criteria corresponding to a plurality of patient states including an underinfusion state, an overinfusion state, and a normal state;”
	Lee determines reward criteria corresponding to patient states as shown in Fig 1, and page 537, “MDP: glucose regulation problem”. 
	“In this system, the set of states S are the glycemic conditions of a patient, the set of actions A are possible insulin doses, and the set of state transition probabilities T ={Psa} depend on an individual’s metabolic responsiveness. The reward function R : S ×A→ R represents the degree of BG regulation as a component that reflects the intent of optimization, and γ ∈ [0,1] is the discount rate and implies the temporal attention of optimization. For example, Fig. 1 depicts a simplified toy model of MDP for a glycemic regulation problem. The model has only three glycemic states (normal Snormal, hypoglycemia Shypo and hyperglycemia Shyper), and two simple infusing actions (0 unit and 0.1 unit of insulin). Even with the same glycemic state and insulin dose, the transitions to another glycemic state and the corresponding rewards are stochastically determined by state transition probabilities owing to the high uncertainty of the glucose metabolic system.”
	Lee discloses additional information about the rewards at p538, “reward functions inspired by the Natural β cell objective” and Fig 2.

With respect to claim 1 and “based on the diabetic patient's treatment record, determining a plurality of transition probabilities for a plurality of state transitions among the plurality of patient states, each transition probability indicating a probability of a respective state change when a respective insulin dose is applied;”
	Lee calculates transition probabilities as illustrated in Fig 1, and p537 as quoted immediately above, as well as p545 “PPO.”

With respect to claim 1 and “determining a respective reward score for each of the plurality of state transitions based on the one or more reward criteria;”
	Lee determines reward scores for the transitions as set forth in Fig 1, p537 as quoted above, p538 section “Reward functions inspired by the natural β cell objective”, Fig 2, and the discussion, p543-544. 

With respect to claim 1 and “applying a discount rate to the respective reward score associated with each of the plurality of state transitions based on the diabetic patient's PK-PD model; and”
	Lee determines and applies discount rates as set forth at Fig 1 and p537 as cited above, p539 “Discount rates derived from individual pharmacokinetics and pharmacodynamics (PK/PD)” section, Fig 3, p540, section “In silico Glucose Clamp Test” and “Treatment Policy Training” and the Discussion section.

With respect to claim 1 and “determining the reinforcement learning algorithm based on a long-term expected reward;”
	Lee addresses long-term expected rewards in the introduction (p537), p538 in the section “reward functions inspired by the natural β cell objective”, Fig 2, p539 in the section “discount rates derived from individual PK/PD models” and Fig 3, Fig 4, p540 in the section “treatment policy training”, and the discussion section.

With respect to claim 1 and “while infusing insulin to the diabetic patient by an insulin infusion pump, in real time, automatically determining a continuous insulin infusion rate by the reinforcement learning algorithm, including:
determining a current insulin dose and a current patient state; 
applying the reinforcement learning algorithm to determine one or more subsequent insulin doses based on the current insulin dose and the current patient state, including:
determining a first reward for a first transition to [[the]] a distinct patient state based on a first reward criterion of the one or more reward criteria of the diabetic patient;
determining a time-dependent discount rate corresponding to the first transition for the first reward; and adjusting the first reward associated with the first transition based on the time-dependent discount rate; and” 

	Lee calculates rates of infusion for insulin to a particular patient using the trained RL algorithm and individual patient data as set forth at pages 542-544, in sections “in silico trial with real-life scenario”, Table III, “Interpreting the AI decision” and Figures 7 and 8. Figure 8 illustrates the control of the infusion pump with the individualized rate. 
With respect to claim 1 and “continuously controlling the insulin infusion pump to infuse the insulin to the diabetic patienton the trained reinforcement learning algorithm, such that a blood glucose level of the diabetic patient is maintained in a normal range to protect the diabetic patient from hypoglycemia.”
Lee provides Fig 8 for the continuous control of the insulin infusion pump, as well as the section “implementation” at page 543.  
As such, claims 1, 7 and 14 are anticipated.
With respect to claims 2, 8 and 15, the RL model of Lee corresponds to PK/PD characteristics associated with the PK/PD model. Throughout.
With respect to claims 3, 9 and 16, Lee addresses both short term and cumulative drug effects at section “reward functions inspired by the natural β cell objective”, and Fig 2, as well as in the section “discount rates derived from individual PK/PD models” and Fig 3.
With respect to claims 4, 6, 10, 12, 17, 19 and 20, the discount rate is disclosed as set forth at Fig 1 and p537 as cited above, p539 “Discount rates derived from individual pharmacokinetics and pharmacodynamics (PK/PD)” section, Fig 3, p540, section “In silico Glucose Clamp Test” and “Treatment Policy Training” and the Discussion section. Integrals are also disclosed. 
With respect to claims 5, 11 and 18, the selected drug dose is set forth based on patient state, calculated rewards over time, discounted by the discount rate. (Sections II and III). 

Claim(s) 1-2, 7-8 and 14-15 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Mougiakakou et al. (2019).
	Mougiakakou, S. et al. Estimation of insulin based on reinforcement learning. US 2019/0214124 A1, July 11, 2019.
	Mougiakakou is directed to: “The optimal insulin to be delivered by an insulin infusion pump is determined by using a reinforcement learning algorithm aiming to the personalized glucose regulation. The algorithm optimizes the daily basal insulin rate and insulin: carbohydrate ratio for each patient, on the basis of his/her measured glucose profile. The proposed algorithm is able to learn in real-time patient-specific characteristics captured in the daily glucose profile and provide individualised insulin treatment. An automatic and personalised tuning method contributes in the optimisation of the algorithm's performance.” (abstract)
With respect to claim 1 and “A method for administrating a continuous drug dose determined using reinforcement learning and a pharmacokinetic-pharmacodynamic model, comprising:
determining a diabetic patient's pharmacokinetic-pharmacodynamic (PK-PD) model;”
	Mougiakakou provides obtaining patient information related to PK/PD modeling at [0026-0027, 0033, 0048-0049, 0084, 00111-00121, et al.]

With respect to claim 1 and “training a reinforcement learning algorithm for the diabetic patient using insulin infusion data and patient state data based on the pharmacokinetic-pharmacodynamic model, including:
obtaining the diabetic patient's treatment record including the insulin infusion data and the patient state data associated with a plurality of infusion events, wherein each infusion event corresponds to a respective one of a plurality of predefined insulin doses, a prior patient state, and a post patient state;
obtaining one or more reward criteria corresponding to a plurality of patient states including an underinfusion state, an overinfusion state, and a normal state; 
based on the diabetic patient's treatment record, determining a plurality of transition probabilities for a plurality of state transitions among the plurality of patient states, each transition probability indicating a probability of a respective state change when a respective insulin dose is applied;
determining a respective reward score for each of the plurality of state transitions based on the one or more reward criteria;
applying a discount rate to the respective reward score associated with each of the plurality of state transitions based on the diabetic patient's PK-PD model; and 
determining the reinforcement learning algorithm based on a long-term expected reward;
Mougiakakou provides a reinforcement learning (RL) algorithm, and its training with patient data related to insulin infusion data, and patient state data, at [0006, 0029-0049].
Mougiakakou provides obtaining reward criteria related to patient states at [0029-0049, 0066-0079]. The states include normal BGL, hypoglycemia and hyperglycemia, as set forth at [0085 and 00137]. 
Mougiakakou provides determining transition probabilities for state transitions between the patient states at [0006, 0066-0067]
M provides a reward score at [0067] and Eq (1). 
Mougiakakou provides a critic, which appears to meet the BRI of a discount rate at [0069-0079]. Table 1 provides Discount Factors and long-term costs.
Mougiakakou provides the trained RL based on long-term rewards as set forth at [0006, and Table 1]. 

With respect to claim 1 and “while infusing insulin to the diabetic patient by an insulin infusion pump, in real time, automatically determining a continuous insulin infusion rate by the reinforcement learning algorithm, including:
determining a current insulin dose and a current patient state; 
applying the reinforcement learning algorithm to determine one or more subsequent insulin doses based on the current insulin dose and the current patient state, including:
determining a first reward for a first transition to [[the]] a distinct patient state based on a first reward criterion of the one or more reward criteria of the diabetic patient;
determining a time-dependent discount rate corresponding to the first transition for the first reward; and adjusting the first reward associated with the first transition based on the time-dependent discount rate; and” 
Mougiakakou provides determining continuous insulin dosage rates, for the insulin to be infused, by the trained RL, for example at [0002, 0026-0027, 0029-0031, 0043, 00125, 00131] Figure 1 is an illustration of the pump, controller, monitor as described. 
[0029] “use a reinforcement learning algorithm to operate an adaptive controller, the reinforcement learning algorithm comprising: [0030] a. a critic which evaluates an insulin control policy (S) including at least one of an insulin infusion rate and an insulin to carbohydrate ratio, and [0031] b. an actor which improves the insulin control policy (S)”
The trained RL of Mougiakakou determines current dose, and state of patient, determines subsequent doses, rewards, discounts and calculates a rate of delivery to maintain normal glucose levels in the patient.

With respect to claim 1 and “continuously controlling the insulin infusion pump to infuse the insulin to the diabetic patient
Mougiakakou notes that the RL continuously learns from data acquired from the system, including patient blood glucose levels, insulin to carbohydrate levels, and the desired patient state. The trained RL has multiple actions which can be taken based on the various outcomes, dosages, and historical data. The overall goal is to increase the time spent by the patient in the desired state: normal. [0130-0154].
The methods of Mougiakakou are carried out using computer systems, and computer program products. As such, claims 1, 7 and 14 are anticipated.
With respect to claims 2, 8 and 15, the RL model of Mougiakakou corresponds to PK/PD characteristics associated with the PK/PD model. Throughout.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 2022/0189603 A1, Damiano et al. Medicament pumps and control systems for managing glucose control therapy data of a subject. Published 6/16/2022. Uses unsupervised learning algorithms to control blood glucose levels.
US 2012/0246106 A1, Atlas et al. Monitoring device for management of insulin therapy. 9/27/2012. Atlas utilizes learning algorithms to control blood glucose levels.
Javad, M. et al. (2015) Reinforcement learning algorithm for blood glucose control in diabetic patients. Proceedings of the ASME 2015 International Mechanical Engineering Congress and Exposition IMECE2015, Nov 13-19, 2015, Houston, Texas. 9 pages. Javad utilizes RL algorithms to control BG levels, but does not determine transitions between states or use discount rates in the RL. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARY K ZEMAN whose telephone number is 5712720723.  The examiner can normally be reached on 8am-2pm M-F.  Email may be sent to mary.zeman@uspto.gov if the appropriate permissions have been filed.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Larry Riggs can be reached on 571 270-3062.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

	/MARY K ZEMAN/             Primary Examiner, Art Unit 1686

Read full office action

Prosecution Timeline

Nov 24, 2021

Application Filed

May 14, 2025

Non-Final Rejection — §102, §112, §Other

Aug 14, 2025

Response Filed

Oct 16, 2025

Final Rejection — §102, §112, §Other

Dec 31, 2025

Request for Continued Examination

Jan 06, 2026

Response after Non-Final Action

Feb 04, 2026

Non-Final Rejection — §102, §112, §Other (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/554,721

Patent 12586663

COPY NUMBER VARIANT CALLER

2y 5m to grant Granted Mar 24, 2026

17/187,319

Patent 12580051

IDENTIFYING METHYLATION PATTERNS THAT DISCRIMINATE OR INDICATE A CANCER CONDITION

2y 5m to grant Granted Mar 17, 2026

17/532,583

Patent 12571733

UNBIASED SORTING AND SEQUENCING OF OBJECTS VIA RANDOMIZED GATING SCHEMES

2y 5m to grant Granted Mar 10, 2026

19/057,786

Patent 12562239

Systems and Methods for Analyzing Mixed Cell Populations

2y 5m to grant Granted Feb 24, 2026

17/472,730

Patent 12460172

INFORMATION PROCESSING APPARATUS, CELL CULTURE SYSTEM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM

2y 5m to grant Granted Nov 04, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

59%

Grant Probability

93%

With Interview (+33.9%)

4y 1m

Median Time to Grant

High

PTA Risk

Based on 532 resolved cases by this examiner. Grant probability derived from career allow rate.