DETAILED ACTION
Applicant's response, filed 09/12/2025, has been fully considered. The following rejections and/or objections are either reiterated or newly applied. Herein, "the previous Office action" refers to the Non-Final Rejection of 06/27/2025.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Claims 1-6 and 8-9 are examined; claim 7 is cancelled.
Priority
This US Application 17/534,841 (11/24/2021) claims priority from Foreign Application No. CN202011467155.8 (12/14/2020) as reflected in the filing receipt mailed on Mar. 30, 2022. The claims to the benefit of priority are acknowledged and the effective filing date of claims 1-6 and 8-9 is 12/14/2020.
Withdrawal / Revision of Objections and/or Rejections
In view of the amendment and remarks from 09/12/2025, the objection of claim 7 is hereby withdrawn in view of Applicant's amendments. In view of the amendment and remarks from 09/12/2025, the rejection of claims 2-7 under 35 U.S.C. § 112(a) and 112(b) is hereby withdrawn in view of Applicant's amendments, rendering the ground of rejection moot. In view of the amendment and remarks from 09/12/2025, the rejection of claims 1-9 under 35 U.S.C. § 101 is hereby withdrawn in view of Applicant's amendments and because the arguments regarding the improvement at Step 2A, Prong 2 were persuasive, rendering the ground of rejection moot. In view of the amendment and remarks from 09/12/2025, the rejection of claim 7 under 35 U.S.C. § 103 is hereby withdrawn in view of Applicant's amendments, rendering the ground of rejection moot. In view of the amendment and remarks from 09/17/2025, the 35 USC § 112f interpretation is hereby withdrawn in view of Applicant's amendments The following rejections and/or objections are either maintained or newly applied for claims 1-6 and 8-9. They constitute the complete set applied to the instant application.
Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION —The specification shall conclude with one or more claims
particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-6 and 8-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 1 and 8-9 recite “wherein for each time step during training of the target agent, the training comprises:” but then there steps that defines an initial portion and a later portion of training iterations, which are the time steps. It is unclear if the limitations in the wherein clause are performed at each time step or only these initial and later portions of the training time steps. Additionally, the terms “initial portion” and “later portion” are relative terms and therefore unclear because there is no standard provided to determine which time steps are in the initial portion versus a later portion. Claims 2-6 are rejected because they depend from claim 1 and do not resolve the indefiniteness issues present in claim 1.
Furthermore, regarding claim 2 it is unclear how the training recited in claim 2 is related to the training recited in claim 1; and regarding claims 5-6 it is unclear how the pretraining relates to the training that occurs in claim 1. The relationships there are unclear which causes the limitations to be indefinite.
Claim Interpretation
The recited “the sampling probability is adjusted to increase exploration of input sources, thereby improving a fit of the target agent to an existing small-molecule compound structure sequence set and reducing exposure bias” (claims 1 and 8-9) is being interpreted as a limitation that naturally flows from adjusting the sampling probability .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-5 and 8-9 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Popova et. al. “Deep reinforcement learning for de novo drug design” Sci Adv. 4(7):eaap7885 (2018) – referred to in the action as Popova).
Independent claim 1 recites “method, comprising: training a target agent based on a first reward and a second reward, wherein the target agent comprises a recurrent neural network, the first reward is a reward determined by a model likelihood of a target neural network model, the second reward is a target chemical compound-specific reward based on target requirements, and the target agent is configured to determine a molecular compound structure; and generating a target molecular structure of a chemical compound using the target agent”. Popova teaches a Reinforcement Learning for Structural Evolution strategy for de novo design of molecules with desired properties (i.e. determine a molecular compound structure based on target requirements) wherein the strategy includes deep stack-augmented recurrent neural network (pg. 12 col. 2 para. 3) [generative and predictive models]—that are trained separately but are used jointly to generate novel targeted chemical libraries; using generative models trained with a stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models derived to forecast the desired properties of the generated compounds (i.e. self-defined based on target requirements) (pg. 1 para. 1); wherein the generative model is pretrained (pg. 12 col. 1 para. 3) to play a role of an agent (i.e. training agent), whereas the predictive model estimates the agent’s behavior by assigning a numerical reward to every generated molecule (pg. 2 col. 1 para. 2) (i.e. predictive model creates a reward determined by a model likelihood of a target neural network model – thus first reward); wherein the reward is a function of the numerical property generated by the predictive model, and the generative model is trained to maximize the expected reward/function (i.e. generative model creates a reward based on target requirements – thus second reward) (pg. 2 col. 1 para. 2); anticipating claim 1.
Independent claim 1 recites “wherein for each time step during training of the target agent, the training comprises: generating a sampling probability based on a current training iteration and a total number of training iterations; selecting, for a current time step and in accordance with the sampling probability, an input to the target agent that is either (i) a ground truth vector for a symbol from a small molecule compound structure sequence or (ii) an output vector generated by the target agent at a previous time step; and applying the selected input to the recurrent neural network to update a hidden state of the recurrent neural network at the current time step; wherein the sampling probability is configured such that, during an initial portion of the training iterations, the selected input to the target agent is more likely to be the output vector generated at the previous time step to ensure faster convergence when training begins, and, during a later portion of the training iterations, the sampling probability is adjusted to increase exploration of input sources, thereby improving a fit of the target agent to an existing small-molecule compound structure sequence set and reducing exposure bias”. Popova teaches a Reinforcement Learning for Structural Evolution strategy for de novo design of molecules with desired properties (i.e. determine a molecular compound structure based on target requirements) wherein the strategy includes deep neural networks [generative and predictive models]—that are trained separately but are used jointly to generate novel targeted chemical libraries (pg. 1 para. 1); wherein at each time step t, the generative model takes the previous state as an input and estimates the probability distribution of the next action, with next action at sampled from this estimated probability (i.e. sampling probability) (pg. 2 col. 1 para. 3); wherein the vector of hidden states from the previous time step is used to generate the vector of hidden states at time t (i.e. an output vector generated by the target agent at a previous time step) (pg. 12 eq. 6); wherein the state s0 with length 0 is unique and considered the initial state and the state sT of length T is called the terminal state as it causes training to end (pg. 2 col. 1 para. 3); wherein all models were trained with the learning-rate decay technique until convergence (pg. 12 col. 1 para. 3); wherein the cross-entropy loss function is calculated, and parameters of the model are updated (i.e. adjusted sampling probability) (pg. 12 col. 1 para. 5); anticipating claims 1 and 8-9.
Dependent claim 2 recites “wherein the training on the target agent based on the first reward and the second reward comprises: acquiring an initial agent; determining a model likelihood of a small molecule compound structure sequence, generated by the initial agent, in relation to the target neural network model as the first reward and determining a molecular structure limiting conditions set based on the target requirements as the second reward; is subjecting the first reward and the second reward to consolidation processing to obtain a processing result; and updating, based on the processing result, the initial agent to the target agent using a policy gradient algorithm”. Popova teaches that the generative network can be treated as a policy approximation model (pg. 2 col. 1 para. 3); which is used for the application of a gradient algorithm (pg. 2 col. 2 para. 1); wherein the use of flexible reward function enables different library optimization strategies for deep neural networks [predictive and generative models creating the first and second rewards respectively as applied for claim 1]—where one can minimize, maximize, or impose a desired range to a property of interest in the generated compound libraries (i.e. determining a molecular structure limiting conditions set based on the target requirements) (pg. 11 col. 1 para. 4); wherein both predictive and generative models are trained separately with supervised learning algorithms, and during the second stage, the models are trained jointly with an reinforcement learning approach that optimizes target properties (i.e. consolidation processing to obtain a processing result from the target neural network model) (pg. 2 col. 1 para. 2); anticipating claim 2.
Dependent claim 3 recites “wherein the method further comprises: determining a small molecule compound structure symbol corresponding to the current time step based on a current cell state of at least one step in the recurrent neural network; and combining small molecule compound structure symbols corresponding to at least one step in the recurrent neural network to form the small molecule compound structure sequence”. Popova teaches the use of Stack-augmented recurrent neural network (pg. 12 col. 2 para. 3) which outputs molecules in SMILES symbol notation (pg. 2 col. 2 para. 2); wherein at each time step, in the training mode, the generative network takes a current prefix of the training object (i.e. current step based on a current cell state) and predicts the probability distribution of the next character (pg. 12 col. 1 para. 4); wherein a sample of molecules produced by the generative model can be seen in Fig. 2 (i.e. small molecule compound structure) (pg. 5); wherein symbols are used to define canonical SMILES strings to encode sequences for chemical structures (i.e. combining small molecule compound structure symbols) (pg. 2 col. 1 para. 3); anticipating claim 3.
Dependent claim 4 recites “acquiring a small molecule compound structure sequence set, wherein the small molecule compound structure sequence set comprises: at least one small molecule compound structure sequence; acquiring a vocabulary corresponding to each small molecule compound structure sequence of the at least one small molecule compound structure sequence; and adding a first token and a second token to each small molecule compound structure sequence and adding the first token and the second token to the vocabulary corresponding to s each small molecule compound structure sequence, wherein the first token is used to indicate a start position, and the second token is used to indicate an end position”. Popova teaches that, in the reinforcement learning system, the set of actions A is defined as an alphabet, that is, the entire collection of letters and symbols (i.e. vocabulary) used to define canonical SMILES strings to encode chemical structures. (i.e. acquiring a vocabulary corresponding to each small molecule compound structure sequence of the at least one small molecule compound structure sequence (pg. 2 col. 1 para. 3); wherein during training, the input token is a character in the currently processed SMILES string from the training set (pg. 3 Fig. 1); wherein the set of states S is defined as all possible strings in the alphabet with lengths from zero to some value (pg. 2 col. 1 para. 3); wherein the state s0 with length 0 is unique and considered the initial state (i.e. start position) and the state sT of length T is called the terminal state (i.e. end position), as it causes training to end (pg. 2 col. 1 para. 3); anticipating claim 4.
Dependent claim 5 recites “pretraining, based on the small molecule compound structure sequence set, an initial neural network model to obtain a target neural network model. Popova teaches the use of deep neural networks [generative and predictive models] - that are trained separately but are used jointly to generate novel targeted chemical libraries (pg. 1 para. 1); wherein the first stage involves pretraining the generative model (pg. 12 col. 1 para. 3); wherein both generative and predictive models are combined into a single reinforcement learning system (i.e. target neural network model) (pg. 1 col. 1 para. 3); anticipating claim 5.
Independent claim 8 recites a system, comprising: a processor; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions to perform the steps described in claim 1. Independent claim 9 recites a computer program product being embodied in a tangible non-transitory computer readable storage medium and comprising computer instructions to perform the steps described in claim 1. The description of how Popova reads on the limitations in claim 1 have been addressed above Popova teaches a novel computational strategy (i.e. system, comprising: a processor; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions) for de novo design of molecules with desired properties termed Reinforcement Learning for Structural Evolution; using generative models trained (i.e. computer instructions) with a stack-augmented memory network (i.e. non-transitory computer readable storage medium) to produce chemically feasible SMILES strings, and predictive models derived to forecast the desired properties of the generated compounds (i.e. self-defined based on target requirements) (pg. 1 para. 1); wherein the model is training on a graphics processing unit (pg. 12, col. 1, para. 4); anticipating claims 8-9.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of pre-AIA 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under pre-AIA 35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA 35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA 35 U.S.C. 103(c) and potential pre-AIA 35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA 35 U.S.C. 103(a).
Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Popova as applied to claims 1 and 4-5 in the 35 U.S.C. 102 rejection above further in view of Olivercrona et. al. “Molecular de‑novo design through deep reinforcement learning” J. Cheminform. 9:48 (2017) – referred to in the action as Olivercrona.
Determination of the Scope and Content of the Prior Art
(MPEP §2141.01)
Dependent claim 6 recites “wherein the pretraining of the initial neural network model to obtain the target neural network model comprises: selecting a small molecule compound structure training sequence from the small molecule compound structure sequence set; converting, based on the vocabulary corresponding to each small molecule compound is structure sequence of the at least one small molecule compound structure sequence, symbols corresponding to each step in the initial neural network model into vector representations; setting the first token as an input argument for the initial neural network model to generate a small molecule compound structure sequence step by step in the initial neural network model; adding up loss values corresponding to each step in the initial neural network model to obtain a statistical result; and updating, based on the statistical result, the initial neural network model to the target neural network model using backpropagation through time”. Popova teaches that the first stage of the RL combined model (i.e. target neural network model) involves pretraining the generative model (pg. 12 col. 1 para. 3) with a stack-augmented memory network (pg. 1 para. 1); wherein during training, the input token is a character in the currently processed SMILES string from the training set (i.e. setting the first token as an input argument for the initial neural network model to generate a small molecule compound structure sequence) (pg. 3 Fig. 1); wherein at each time step, in the training mode, the generative network takes a current prefix of the training object and predicts the probability distribution of the next character (i.e. step by step in the initial neural network model) (pg. 12 col. 1 para. 4); wherein For example, an aspirin molecule is encoded as [CC(O)OC1CCCCC1C(O)O] (i.e. converting, based on the vocabulary corresponding to each small molecule compound is structure sequence of the at least one small molecule compound structure sequence) (pg. 12 col. 1 para. 3); wherein on the basis of this comparison, the cross-entropy loss function is calculated , and parameters of the model are updated (i.e. adding up loss values corresponding to each step) (pg. 12 col. 1 para. 4); wherein QSAR models consisted of an embedding layer transforming the sequence of discrete tokens into a vector of 100 continuous numbers (pg. 12 col. 1 para. 2); wherein each QSAR method is defined as statistical methods to the problem of finding empirical relationships (pg. 5 col. 1 para. 1) and the reinforcement learning model (i.e. target model) is used for the estimation of the statistical relationship between the actions and their possible outcomes (i.e. obtain a statistical result) (pg. 1 col. 2 para. 2); reading on the recited limitations in claim 6.
Ascertainment of the Difference Between Scope the Prior Art and the Claims
(MPEP §2141.02)
Regarding claim 6, Popova does not explicitly teach “updating, based on the statistical result, the initial neural network model to the target neural network model using backpropagation through time”. However, Olivercrona teaches a method to tune a sequence-based generative model for molecular de novo design through deep reinforcement learning (pg. 1 para. 1); wherein back-propagation through time is applied to recurrent neural networks as a method of fitting a neural network using the gradient of the prediction with respect to network parameters- to make an update of - network parameters- (pg. 3 col. 1 para. 2); reading on the recited limitations in claim 6.
Finding of Prima Facie Obviousness Rationale and Motivation
(MPEP §2142-2143)
Regarding claim 6; it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings by Olivercrona to the Reinforcement Learning for Structural Evolution strategy for de novo design of molecules with desired properties (i.e. determine a molecular compound structure based on target requirements) wherein the strategy includes deep neural networks [generative and predictive models]—that are trained separately but are used jointly to generate novel targeted chemical libraries as taught by Popova to update, based on the statistical result, the initial neural network model to the target neural network model using backpropagation through time. One of ordinary skill in the art would be motivated to apply the teachings by Olivercrona to the method by Popova to explore how parameters such as training set size, model size, regularization and training time influence the quality and variety of structures generated (pg. 12 col. 2 para. 2 Olivercrona). One of ordinary skill in the art would be able to motivated to combine the teachings in these references with a reasonable expectation of success since the described teachings pertain to methods for application of reinforcement learning for de novo drug design.
Response to applicant's remarks in regards of Claim Rejection 35 U.S.C. ~ 102/103
The Remarks of 09/12/2025 have been fully considered but are not persuasive for the reasons below:
Applicant asserts “The Examiner rejects claims 1-9 under 35 U.S.C. §§ 102 and 103 by citing references including Popova et. al. "Deep reinforcement learning for de novo drug design", Olivercrona et. al. "Molecular de-novo design through deep reinforcement learning", and Arus-Pous et. al. "Exploring the GDB-13 chemical space using deep generative models". Although Applicant does not necessarily agree with the Examiner, to expedite allowance of this Application, Applicant has made clarifying amendments to claims 1, 3, 8, and 9 to further clarify the distinction between the claims and the cited art. Applicant believes that these amendments obviate the Examiner's rejections and the Examiner has indicated in the interview that the amendments would likely overcome the rejections and additional searches and considerations will be conducted. Applicant respectfully requests the Examiner to reconsider and allow all pending claims” – pg. 10 para. 1. This argument is unpersuasive because the recited limitations introduced by the amendments are taught by the prior art as described in detail above in the rejection.
Conclusion
No claims are allowed.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANCINI A FONSECA LOPEZ whose telephone number is (571)270-0899. The examiner can normally be reached Monday - Friday 8AM - 5PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Olivia Wise can be reached at (571) 272-2249. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/F.F.L./Examiner, Art Unit 1685
/OLIVIA M. WISE/Supervisory Patent Examiner, Art Unit 1685