Office Action Analysis: 17244711 — SYSTEMS AND METHODS FOR REINFORCEMENT LEARNING MOLECULAR MODELING

Examiner Intelligence

LEVERETT, MARY CHANG View full profile →

Grants 60% of resolved cases

Career Allowance Rate

52 granted / 86 resolved

+0.5% vs TC avg

Strong +23% interview lift

Without

With

+22.6%

Interview Lift

resolved cases with interview

Typical timeline

4y 1m

Avg Prosecution

14 currently pending

Career history

110

Total Applications

across all art units

Statute-Specific Performance

§101

27.3%

-12.7% vs TC avg

§103

55.4%

+15.4% vs TC avg

§102

6.3%

-33.7% vs TC avg

§112

1.9%

-38.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 86 resolved cases

Office Action

§103

DETAILED ACTION
Applicant's response, filed 02/16/2026, has been fully considered. The following rejections and/or objections are either reiterated or newly applied.  They constitute the complete set presently being applied to the instant application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  The claims are examined as filed on 4/29/2021, the effective filing date.  

Claim Status
Claims 1-19 and 21 are pending.
Claim 20 is cancelled.
Claims 1-19 and 21 are examined.
Claims 1-19 and 21 are rejected.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim Rejection
Claims 1-19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over JEON 2020 “Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors” in view of GIOLA 2017 “Dynamic Docking: A Paradigm Shift in Computational Drug Discovery.”
Claim Interpretation and Scope and Contents of Prior Art
Claims 1 and 11 recite a method and a system, using processors, to identify a candidate molecule for a target by providing the candidate molecule and the target as inputs to a simulation, and operating the simulation by  modelling interaction between the candidate molecule and the target over successive time steps of the simulation.  With respect to this limitation, JEON teaches using a general computer to identify a candidate molecule by combining reinforcement learning and docking simulation with the steps of inputting the candidate molecule and target into a docking simulation and modelling their interactions over successive time steps (Abstract, pg 1-2, Fig 1, pg 8 last 2 par).  	
Claims 1 and 11 further recite the steps of monitoring at least one state of the simulation at each successive time step, and updating a reinforcement learning model based on potential energy values of the at least one state of the simulation at each successive time step.  With respect to these limitations, JEON teaches monitoring a state of the simulation at each step and updating a reinforcement learning model at each step (Fig 1) based on several scoring functions, including a docking score, which is calculated from binding energy (pg 8, last 2 par).  
JEON does not specifically teach that the reinforcement model is updated based on potential energy values of the state of the simulation, however GIOLA provides a review of dynamic molecular docking simulations for drug-target recognition, teaching that potential energy values can be observed or changed (pg 9). One of ordinary skill in the art would understand that a reinforcement learning model can be updated using any score/metric from the simulation, such that potential energy can be substituted for docking score/binding energy.
Claims 1 and 11 further recite the steps of generating an action to the candidate molecule using the updated reinforcement learning model, modifying the candidate molecule based on the action, and updating a parameter of the simulation based on the candidate molecule modified in the previous step.  With respect to these limitations, JEON teaches generating actions to the candidate molecule using the reinforcement learning model, and with each update to the model, modifying the candidate molecule based on the actions, and updating the simulation such that the modified molecule becomes the molecule of the next state (Fig 1, pg 2).
Claims 1 and 11 further recite the steps of repeating all of the previous steps with the modified and the updated simulation until a convergence condition is satisfied, and outputting the modified candidate molecule responsive to the convergence condition being satisfied, such that the modified candidate molecule has a greater binding affinity for the target compared to the candidate molecule.  With respect to these limitations, JEON teaches repeating the process such that the molecule is modified each cycle until it reaches the final state, in which the expected rewards of the actions in the model and the real rewards of the action, and follows a ε-greedy algorithm in which convergence condition is reached when ε reaches 0 and the molecule is output as the optimized molecule (Fig 1, pg 2); the modified molecule at the end of the process has a higher predicted binding affinity for the target protein (Abstract, pg 6 last par, pg 8 last 2 par).
Claims 2 and 12 recite the limitations wherein operating the simulation comprises operating a molecular dynamics simulation for a first time step and a second time step subsequent to the first time step, monitoring the at least one state of the simulation at each successive time step comprises monitoring a first value of the at least one state of the simulation associated with the first time step, and modifying the candidate molecule comprises modifying a characteristic of the candidate molecule associated with the second time step based on the first value of the at least one state of the simulation associated with the first time step.  With respect to these limitations, JEON teaches performing docking simulations at each time step and monitoring the binding affinity of the molecule in the simulation at each step (pg 1 par 3) and other score values (Fig 1), and modifying the candidate molecule based on these values (Fig 1, pg 2).
Claims 3 and 13 recite the limitation wherein the target is a target molecule.  With respect to this limitation, JEON teaches that the target is a protein molecule (Abstract).
Claims 4 and 14 recite the limitation wherein modifying the candidate molecule based on the at least one state of the simulation comprises modifying at least one of a functional group of the candidate molecule, an atom of the candidate molecule, or a pose of the candidate molecule.  With respect to this limitation, JEON teaches that modifying the molecule can include addition or removal of an atom or a bond in a chemically valid manner (pg 2 par 1).
Claims 5 and 15 recite the limitation wherein the at least one state of the simulation comprises at least one of a pose of the candidate molecule, a force between the candidate molecule and a target molecule, or an energy of the candidate molecule.  With respect to this limitation, JEON teaches that state of the simulation includes an energy of the candidate molecule measured in kcal/mol (Fig 1).
Claims 6 and 16 recite the limitations wherein generating the action to the candidate molecule using the updated reinforcement learning model comprises applying at least one of a policy or a model to the candidate molecule to generate a plurality of modified candidate molecules, and selecting the modified candidate molecule from the plurality of modified candidate molecules based on a score determined for the plurality of modified candidate molecules.  With respect to this limitation, JEON teaches applying an “optimal policy” using the reinforcement learning model to generate optimal candidates (pg 4 par 1), and further teaches evaluating the modified molecules by scoring functions (pg 2).
Claim 7 recites the limitation of the reinforcement model comprising a neural network, while claim 17 recites the limitation of the reinforcement learning model comprising a deep Q-learning model.  With respect to this limitation, JEON teaches that its reinforcement learning model using double Q-learning and bootstrapped Deep Q-Networks, which comprise neural networks (pg 2).
Claims 8 and 18 recite the limitation wherein the candidate molecule comprises at least one of a protein, a peptide, a small molecule having a molecular weight less than a threshold molecular weight, or an antibody.  With respect to this limitation, JEON teaches that the candidate molecule is a protein.
Claims 9 and 19 recite the limitation wherein the at least one state of the simulation comprises a binding affinity between the candidate molecule and a protein.  With respect to this simulation, JEON teaches that the state of the simulation/ modified molecule includes binding affinity for the target protein (Abstract, pg 6 last par, pg 8 last 2 par).
Claim 10 recites the limitation wherein the at least one state of the simulation comprises a distance between the candidate molecule and a binding site of the protein.  JEON does not teach this limitation, however GIOLA teaches determining the distance between the center of masses of the ligand (candidate molecule) and the binding site (pg 12 par 3).
Claim 21 recites the limitation wherein the parameter of the simulation comprises a velocity of the modified candidate molecule.  JEON does not teach this limitation, however GIOLA reviews a molecular dynamics protocol in which velocity reassignment occurs at the restart of a simulation run (pg 12 par 3).  
Resolving Ordinary Skill in the Art and Obviousness Rationale
A teaching, suggestion, or motivation in the prior art would have led one of ordinary skill in the art to modify or combine the prior art to arrive at the claimed invention.  Specifically, a person of ordinary skill in molecule simulation and design would have been motivated to combine the teachings of JEON with the teachings of GIOLA, in order to achieve the claimed invention, because the potential energy values during a simulation are needed for determining molecular docking calculations and scoring (pg 4), and similarly, an updated velocity parameter can contribute to the likelihood of binding success (pg 12 par 3).  A person of ordinary skill would reasonably expect success from combining these teachings, as both JEON and GIOLA teach methods for conducting simulations on molecules for drug discovery, and the simulation methods and analysis of GIOLA can be applied in JEON to improve the reinforcement learning model. Therefore, the claims at issue would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention as there is both a reason to modify or combine the prior art, and a reasonable expectation of success (see MPEP 2143.02 (I)).

Response to Arguments – Rejections Under 35 USC § 103
	In the reply filed 2/16//2026, the Applicant asserts that the cited prior art does not teach steps (c)-(f) of claim 1 (remarks pg 8).  However, the arguments are unpersuasive.
	The Applicant asserts that JEON’s method only determines a docking score for the modified molecule once, without modelling interaction between the candidate molecule and the target (remarks pg 9-10).  However, Fig 1 and pg 2 of JEON shows that at each molecule state n, the method uses a reinforcement learning framework to modify the molecule and evaluate it in docking against the target protein, potentially several times, until an optimized state/higher docking score is reached.  This is emphasized at the end of page 2, which explains the process, and states that after sufficient number of episodes, MORLD (the computational method) steadily generates potential novel inhibitors with higher docking score (along with high SA and QED scores) to the given protein structure.
	The Applicant also asserts that modifying JEON’s method to include modeling interaction between the candidate molecule and the target over successive time steps of the simulation renders it inoperable for its intended purpose (remarks pg 10). However, JEON’s purpose, as stated in the abstract, is to generate and optimize lead compounds by combining reinforcement learning and docking to develop novel inhibitors, not necessarily to do so in a certain amount of time.  JEON does model interaction and determine scores at each iteration of the reinforcement learning cycle until the next state is output (see Fig 1) and the process continues for several episodes.
	The Applicant also asserts that JEON fails to teach or suggest that operating the simulation comprises operating a molecular dynamics simulation and that there would be no motivation to modify JEON’s method to include MD because doing so would increase processing time (remarks pg 11).  However, GIOLA teaches molecular dynamics simulations as an alternative to traditional docking (Abstract), and there is nothing in JEON to suggest that a  longer processing time would render the method inoperable – it would simply take longer, while still meeting its goal.  One would be motivated to apply JEON’s method of reinforcement learning and molecule optimization at successive time steps as GIOLA explains that MD allows the full exploration of drug-target recognition and binding from both the mechanistic and energetic points of view (Abstract), so one of ordinary skill in the art would be motivated to combine these for more accurate drug development.
	The Applicant also asserts that JEON and GIOLA do not suggest or teach updating a reinforcement learning model based on potential energy values of a state of a simulation (remarks pg 12).  However, one of ordinary skill would understand that a reinforcement learning model can be updated using any score/metric from the simulation, such that potential energy, a inherent part of the total energy accounted for in molecular dynamics simulations, can be substituted for docking score/binding energy in the model.
	The Applicant asserts that a person of ordinary skill would not modify JEONS’s method to include molecular dynamics parameters because it lacks a reasonable expectation of success and that JEON purposefully avoids them (remarks pg 12-13).  However, GIOLA provides the rationale and expectation in its abstract that MD allows the full exploration of drug-target recognition and binding from both the mechanistic and energetic points of view.  While JEON’s method benefits from a speedy design process, one would still expect success in developing optimized compounds by combining its reinforcement learning methods with molecular dynamics simulations.
	As such, the claims are still considered obvious over the prior art.

Conclusion
No claim is allowable.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARY C LEVERETT whose telephone number is (571)272-5494. The examiner can normally be reached 8:00am - 5:00pm M-Th.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz R. Skowronek can be reached at (571) 272-9047. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.C.L./Examiner, Art Unit 1687
/Karlheinz R. Skowronek/Supervisory Patent Examiner, Art Unit 1687

Read full office action

Prosecution Timeline

Show 19 earlier events

May 14, 2025

Response after Non-Final Action

Nov 17, 2025

Non-Final Rejection mailed — §103

Feb 12, 2026

Applicant Interview (Telephonic)

Feb 12, 2026

Examiner Interview Summary

Feb 16, 2026

Response Filed

Apr 01, 2026

Final Rejection mailed — §103

May 27, 2026

Applicant Interview (Telephonic)

May 27, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

17/251,293

Patent 12633377

METHODS FOR DETECTING VARIANTS IN NEXT-GENERATION SEQUENCING GENOMIC DATA

5y 5m to grant Granted May 19, 2026

19/037,106

Patent 12620458

METHODS, SYSTEMS, AND COMPUTER READABLE MEDIA FOR AUTOMATED ASSESSMENT OF ASEPTIC TECHNIQUE OF COMPOUNDING IN A COMPOUNDING HOOD

1y 3m to grant Granted May 05, 2026

16/950,845

Patent 12609186

TECHNIQUES FOR DATA-ENABLED DRUG DISCOVERY

5y 5m to grant Granted Apr 21, 2026

17/043,210

Patent 12603156

METHOD OF SYNTHESIZING A RADIOPHARMACEUTICAL

5y 6m to grant Granted Apr 14, 2026

17/101,912

Patent 12597492

Topology-Driven Completion of Chemical Data

5y 4m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

7-8

Expected OA Rounds

60%

Grant Probability

83%

With Interview (+22.6%)

4y 1m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 86 resolved cases by this examiner. Grant probability derived from career allowance rate.

SYSTEMS AND METHODS FOR REINFORCEMENT LEARNING MOLECULAR MODELING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

SYSTEMS AND METHODS FOR REINFORCEMENT LEARNING MOLECULAR MODELING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email