Last updated: May 29, 2026
Application No. 17/800,129
ADVERSARIAL AUTOENCODER ARCHITECTURE FOR METHODS OF GRAPH TO SEQUENCE MODELS

Final Rejection §102§103§112
Filed
Aug 16, 2022
Priority
Feb 19, 2020 — provisional 62/978,721 +1 more
Examiner
BRACERO, ANDREW ANGEL
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Insilico Medicine Ip Limited
OA Round
2 (Final)
Interview Optional

— +0.0% interview lift. Interview lift (+0.0%) is below the 15.0% threshold. A written response is recommended.
Based on 9 resolved cases, 2023–2026
Examiner Intelligence

BRACERO, ANDREW ANGEL View full profile →
Grants 100% — above average
Career Allowance Rate
9 granted / 9 resolved
+45.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
100.0%
+60.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 9 resolved cases
Office Action

§102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Claims 1-29 are presented for examination in this application, 17/800,129 filed  2022-08-16, having an effective filing date of 2022-02-19 via provisional application 62/978,721. 
	The Examiner cites particular sections in the references as applied to the claims
below for the convenience of the applicant(s). Although the specified citations are
representative of the teachings in the art and are applied to the specific limitations within
the individual claim, other passages and figures may apply as well. It is respectfully
requested that, in preparing responses, the applicant(s) fully consider the references in
their entirety as potentially teaching all or part of the claimed invention, as well as the
context of the passage as taught by the prior art or disclosed by the Examiner.

Response to Arguments
Applicant’s arguments and remarks filed 03/09/2026 have been fully considered.
The arguments and remarks regarding the 35 U.S.C 112 rejections were not all found to be persuasive. The arguments and remarks regarding the 35 U.S.C 101 rejections were found to be persuasive. The arguments and remarks regarding the 35 U.S.C 102
rejections were found to be persuasive however the amendments have necessitated a
change in the references applied resulting in new grounds of rejection. The 35 U.S.C 103 rejections have been maintained.

35 U.S.C 112(b)
Applicant’s response: 
	Applicant asserts “The " The Office Action asserts, among other things, that: (i) the "reporting" recitations in claims 1-6 render it unclear of autoencoder step whether reporting occurs at each step or at the end of an iteration; (ii) claim 1's "autoencoder step comprising" is unclear as to whether it is reciting an input/output of a model versus an architecture, and further recites "generating discriminator output data" within the autoencoder step; (iii) claim 4 recites "sample samples"; (iv) claim 10 lacks antecedent basis for "the condition"; and (v) claims 11, 13, 14, 21, 24, 25, 28, and 29 recite the relative term "desired."", "generator step" and "discriminator step" are distinct steps that together form the method of the claims. The claims are amended to clarify the distinctions further.”
	Applicant further asserts “The Office Action rejects claims 11, 13, 14, 21, 24, 25, 28, and 29 for reciting the term "desired", for allegedly being a relative term thereby rendering the claims indefinite. Applicant respectfully disagrees. The term desired is not indefinite in light of the specification and the knowledge of one of ordinary skill in the art. For example, paragraph [0066] of the specification associates the term "desired properties" with "generative conditions". Further, claim 12 specifically provides examples of such desired properties, such that the term "desired" is not indefinite”

Examiner’s response:
	Examiner respectfully disagrees. In regard to the arguments and remarks that the reporting step renders the claim unclear, the Examiner maintains the position for claims 1-6. While the steps themselves are distinct steps that together form the method of the claims, the claims as written do not reflect this in terms of the reporting trained model limitation. As written, the claims define that each step of the model separately reports a trained model. 
	In regard to the arguments and remarks made that the desired properties in claims 11, 13, 14, 21, 24, 25, 28, and 29 are not rendered indefinite, the Examiner agrees. 	The 35 U.S.C 112(b) rejections are maintained for claims 1-23. The 35 U.S.C 112(b) rejections for claims 24-29 have been overcome.

35 U.S.C 102/103
Applicant’s response: 
	Applicant asserts “Claim 1 is amended to recite, in part, "inputting graph data for a plurality of real objects into an encoder of the G2S model, wherein the graph data for each real object includes an adjacency matrix representing connections between a plurality of nodes of the graph data and a node feature matrix representing attributes of each node[.]" Claim 24 is similarly amended. 
Hong does not expressly or inherently disclose that the graph data input to the encoder for each real object includes both an adjacency matrix representing node connections, and a node feature matrix representing node attributes. Hong's Figure 1 and cited text describe inputting molecular graphs, but do not specify the data structure or representation-there is no teaching that both an adjacency matrix and a node feature matrix are required or used together as input to the encoder. In contrast, the amended claim 1 requires this dual-matrix representation for each real object, which is a concrete technical constraint on the input data format. Accordingly, Hong does not teach all limitations of claims 1 and 24, at least as amended. Claims 3, 5, 8-10, 17-20, 23, and 25-29 variously depend on claims 1 and 24, and they are allowable at least for their dependencies on claims 1 and 24 ”

Examiner’s response:
Examiner agrees that Hong does not teach an adjacency matrix. However, the claims have necessitated a change in the references applied. Examiner asserts, using broadest reasonable light of the specification, that Gao, however, does teach an adjacency matrix and a node feature matrix. For at least these reasons the claims remain rejected under 35 U.S.C 103.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


	Claims 1-23 are under 35 U.S.C. 112(b)  or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
	
	Claims 1-6 mention reporting a trained G2S model within the different steps such as the autoencoder step (claims 1 and 2), generator step (claim 3-4), and discriminator step (5-6). It is unclear as to how each step of the overall G2S model can report the whole trained model.

Claims 7-23 directly depend from and contain the same deficiencies as claim 1 respectively and therefore these claims are rejected for the aforementioned reasons.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1, 3, 5, 8-10, 17-20, 23-25, and 27-29 are rejected under 35 U.S.C 102 as being unpatentable over Hong et al. (“Molecular Generative Model Based on an Adversarially Regularized Autoencoder” hereinafter, Hong) in view of Gao et al. (“DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health Forums” hereinafter, Gao).
Regarding claim 1 (currently amended): 
	Hong teaches a method for training a model to generate an object, the method comprising an autoencoder step comprising (see abstract: “We also demonstrated a successful conditional generation of drug-like molecules with ARAE for the control of both cases of single and multiple properties. As a potential real-world application, we could generate epidermal growth factor receptor inhibitors sharing the scaffolds of known active molecules while satisfying drug-like conditions simultaneously.”):
	providing an variational, adversarial or combination of variational and adversarial autoencoder architecture configured as a graph-to-sequence (G2S) model (see fig. 1 which shows an adversarial autoencoder architecture that takes a graph (molecule) and creates a sequence (SMILES).  Also see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables”. );
	 inputting graph data for a plurality of real objects into an encoder of the G2S model (see fig. 1 which shows an adversarial autoencoder architecture that takes a graph (molecule) as input into the encoder of the G2S model);
…
	 generating sequence data from latent space data with a decoder of the G2S model (see fig. 1 which shows an adversarial autoencoder architecture that takes generates sequence data, SMILES, from the latent space data with a decoder of the G2S model);
	generating discriminator output data from a discriminator of the G2S model (see fig. 1 which shows a “real” or “fake” output from a discriminator of the G2S model);
	 performing an optimization for the encoder and decoder (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ . Then, the distributions of these two variables become as similar as possible by minimizing the first and the second term of eq 8 with gradient descent optimization”);
	 and reporting a trained G2S model in response to the optimization (see fig. 2: “Convergence of the four evaluation metrics for the ARAE model trained with the QM9 dataset.”). 
	Hong, however, does not explicitly teach wherein the graph data for each real object includes an adjacency matrix representing connections between a plurality of nodes of the graph data and a node feature matrix representing attributes of each node.
	Gao, however, analogously teaches wherein the graph data for each real object includes an adjacency matrix representing connections between a plurality of nodes of the graph data and a node feature matrix representing attributes of each node (see pg. 1044 section III. A. : “Definition 1. (dynamic graph). A dynamic graph G = {G1,G2, ··· ,GT} is an ordered sequence of t=1,··· ,T separate graphs on the same set of |V| =N nodes, with each snapshot graph Gt(V,Et) characterized by a weighted adjacency matrix At ∈RN×N and a set of node features Ft∈RN×D for a given time window ,where D represents the total number of node features.”. Also see fig. 3 that shows the dynamic graph encoder as well as the graph object inputs. Also see pg. 1042 section abstract: “Our proposed DynGraph2Seq model consists of a novel dynamic graph encoder and an interpretable sequence decoder that learn the mapping between a sequence of time-evolving user activity graphs and a sequence of target health stages.”)
	Before effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong and Gao before him or her, to modify the method of claim 1 to include attributes of wherein the graph data for each real object includes an adjacency matrix representing connections between a plurality of nodes of the graph data and a node feature matrix representing attributes of each node in order to help capture the complex dynamic characteristics and time-evolving features of graphs (see pg. 1044 section III. A. : “that capture the complex dynamic characteristics and time-evolving features of graphs, as defined in the following. Definition 1. (dynamic graph). A dynamic graph G = {G1,G2, ··· ,GT} is an ordered sequence of t=1,··· ,T separate graphs on the same set of |V| =N nodes, with each snapshot graph Gt(V,Et) characterized by a weighted adjacency matrix At ∈RN×N and a set of node features Ft∈RN×D for a given time window ,where D represents the total number of node features.”.”)

Regarding claim 3 (currently amended): 
	Hong in view of Gao teaches the method of claim 1. 
	Hong further teaches inputting the sample data of a normal distribution into a generator of the G2S model (see fig. 1 showing the normal distribution ‘s’ being input into the generator of the G2S model);
	generating discriminator sample data with the discriminator (see pg. 30 section ‘Previous Works’: “GANs estimate the distribution of input samples, pr(x), through the adversarial training of the generator network and discriminator network. … As a result of the adversarial training of these two networks, the distributions of data samples pr(x) and of the generated samples pg(x) become equivalent, that is, pr(x) = pg(x)”);
	 performing an optimization for the generator (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ . Then, the distributions of these two variables become as similar as possible by minimizing the first and the second term of eq 8 with gradient descent optimization”); 

Regarding claim 5 (currently amended): 
	Hong in view of Gao teaches the method of claim 3. 
	Hong further teaches computing an effectiveness of the discriminator (see pg. 30 section ‘Previous Works’: “where fw is a discriminator (critic) function parameterized by w satisfying the 1-Lipschitz continuity ∥fw∥ ≤ 1”);
	 performing an optimization for the discriminator using the computed effectiveness (see pg. 30 section ‘Previous Works’: “As a result of the adversarial training of these two networks, the distributions of data samples pr(x) and of the generated samples pg(x) become equivalent, that is, pr(x) = pg(x).”); and 
	reporting a discriminator trained G2S model (see fig. 2: “Convergence of the four evaluation metrics for the ARAE model trained with the QM9 dataset.”). 

Regarding claim 8 (currently amended): 
	Hong in view of Gao teaches the method of claim 1. 
	Hong further teaches obtaining real object data having sequence data and property data of sequences in the sequence data (see fig. 1 which takes real molecule graph which has SMILES sequence data); and 
	transforming the sequence data into the graph data (see fig. 1 where the SMILES sequence denoted as yc is fed through a decoder which outputs a molecule).

Regarding claim 9: 
	Hong in view of Gao teaches the method of claim 5.
	Hong further teaches performing an optimization protocol to optimize generation of the objects, each object having a predetermined property (see table 5 which shows the different optimizers being used for generation. Also see pg. 31 section ‘Methods’: “In the encoding phase, therefore, we need to construct the latent vector z by removing a property-associated attribute which we want to substitute with yc as a condition in the decoding phase.”.).

Regarding claim 10 (currently amended): 
	Hong in view of Gao teaches the method of claim 9. 
	Hong further teaches wherein the optimization protocol conditions generation of the objects based on the predetermined property (see table 5 which shows the different optimizers being used for generation. Also see pg. 31 section ‘Methods’: “In the encoding phase, therefore, we need to construct the latent vector z by removing a property-associated attribute which we want to substitute with yc as a condition in the decoding phase.”.),
	 wherein the condition are real valued vectors of the predetermined property directly passed into the latent space of the G2S model (see pg. 31 section ‘Methods’: “In the encoding phase, therefore, we need to construct the latent vector z by removing a property-associated attribute which we want to substitute with yc as a condition in the decoding phase.”.).

Regarding claim 17:
	Hong in view of Gao teaches the method of claim 1. 
	Hong further teaches wherein the real objects are molecules and the properties of the molecules are biochemical properties and/or structural properties (see fig. 1 where the input to the encoder is a structural representation of a molecule).

Regarding claim 18: 
	Hong in view of Gao teaches the method of claim 1.
	Hong further teaches wherein the sequence data includes SMILES, InChI, SYBYL line notation (SLN), SMILES arbitrary target specification (SMARTS), Wiswesser line notation (WLN), ROSDAL, or combinations thereof (see pg. 34 section ‘Implementation Details’: “The LSTM layer of the encoder reads sequential SMILES strings and transforms them to the latent vectors”).

Regarding claim 19: 
	Hong in view of Gao teaches the method of claim 1.
	Hong further teaches wherein the G2S model includes a machine learning platform, which includes at least two machine learning models that are neural networks selected from the group consisting of fully connected neural networks, convolutional neural networks, graph neural networks, and recurrent neural networks (see table 4 which denotes the encoder and decoder as each being graph neural networks).

Regarding claim 20: 
	Hong in view of Gao teaches the method of claim 19.
	Hong further teaches wherein the machine learning platform includes at least two machine learning algorithms that are a reinforcement learning algorithm and a Bayesian optimization algorithm (see pg. 33-34 section ‘De Novo Design of EGFR Inhibitors’: “For both scenarios, we evaluate the EGFR inhibition activity of the generated molecules by using the Bayesian graph convolutional network (Bayesian GCN) developed in our previous work.44 The Bayesian GCN was first trained and tested with the EGFR subset in the DUD-E set and then applied to the molecules generated by each scenario. The results are shown in Table 3.”. Also see pg. 35 section ‘Implementation Details’: “In the inference phase, we obtained the predictive distribution with the number of Monte Carlo sampling of 50 for Bayesian inference.”).

Regarding claim 23: 
	Hong in view of Gao teaches the method of claim 1.
	Hong further teaches wherein the real objects are images and the properties are descriptions having sequences of natural language words (see fig. 1: “. Architecture of CARAE for molecular generations. The encoder embeds the SMILES representation of the molecular structure x to the latent vector z, and the decoder reconstructs the molecular structures from the latent vector”. Also see pg. 34 section ‘Implementation Details’: “The LSTM layer of the encoder reads sequential SMILES strings and transforms them to the latent vectors”).

Regarding claim 24 (currently amended): 
	Hong teaches a method of generating an object, the method comprising (see fig. 1: “Architecture of CARAE for molecular generations.”): 
	providing a graph-to-sequence (G2S) model (see fig. 1 where a graph, a molecule with its structure, transforms to a SMILES representation in an adversarial autoencoder model. Also see pg. 34 section ‘Implementation Details’: “The LSTM layer of the encoder reads sequential SMILES strings and transforms them to the latent vectors”);
 inputting graph data of real objects and properties thereof into the G2S model (see fig. 1 where a graph of a real molecular structure and its properties are inputted into the G2S model); 
training the G2S model with the graph data and property data to obtain a trained G2S model (see fig. 1: “Architecture of CARAE for molecular generations. The encoder embeds the SMILES representation of the molecular structure x to the latent vector z, and the decoder reconstructs the molecular structures from the latent vector. As a result of the adversarial training, two distributions pθ(z|x) and pψ(z) become equivalent.”);
 inputting desired property data of a desired property into the trained G2S model (see fig. 1: “. In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties.”);
 generating a new object with the desired property with the trained G2S model (see fig. 1: “. In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties.”); and 
reporting the new object that has the desired property (see fig. 1: “. In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties.”)
Hong does not explicitly teach wherein the graph data of the real objects includes an adjacency matrix representing connections between a plurality of nodes of the graph data and a node feature matrix representing attributes.
Gao, however, analogously teaches wherein the graph data of the real objects includes an adjacency matrix representing connections between a plurality of nodes of the graph data and a node feature matrix representing attributes (see pg. 1044 section III. A. : “Definition 1. (dynamic graph). A dynamic graph G = {G1,G2, ··· ,GT} is an ordered sequence of t=1,··· ,T separate graphs on the same set of |V| =N nodes, with each snapshot graph Gt(V,Et) characterized by a weighted adjacency matrix At ∈RN×N and a set of node features Ft∈RN×D for a given time window ,where D represents the total number of node features.”. Also see fig. 3 that shows the dynamic graph encoder as well as the graph object inputs. Also see pg. 1042 section abstract: “Our proposed DynGraph2Seq model consists of a novel dynamic graph encoder and an interpretable sequence decoder that learn the mapping between a sequence of time-evolving user activity graphs and a sequence of target health stages.”)
	Before effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong and Gao before him or her, to modify the method of claim 24 to include attributes of wherein the graph data for each real object includes an adjacency matrix representing connections between a plurality of nodes of the graph data and a node feature matrix representing attributes of each node in order to help capture the complex dynamic characteristics and time-evolving features of graphs (see pg. 1044 section III. A. : “that capture the complex dynamic characteristics and time-evolving features of graphs, as defined in the following. Definition 1. (dynamic graph). A dynamic graph G = {G1,G2, ··· ,GT} is an ordered sequence of t=1,··· ,T separate graphs on the same set of |V| =N nodes, with each snapshot graph Gt(V,Et) characterized by a weighted adjacency matrix At ∈RN×N and a set of node features Ft∈RN×D for a given time window ,where D represents the total number of node features.”.”)

Regarding claim 25 (currently amended): 
	Hong in view of Gao teaches the method of claim 24.
	Hong further teaches synthesizing a real molecule based on the new object having the desired property, wherein the desired property includes biochemical properties and structural properties of a molecule (see pg. 34 ‘Conclusions’: “To demonstrate potential real-world applications, we applied the conditional generation scheme to designing EGFR inhibitors. We could successfully generate new candidate compounds while satisfying drug-like conditions simultaneously.”. Also see pg. 33 section ‘Results and Discussion’: “Generation with the four conditions: activity=1, log P= 2.5, SAS=1.5, and TPSA=60. It was intended to generate molecules satisfying Lipinski’s rule offive43 and synthesizability.”); and 
	validating the new object to have the desired property (see fig. 1: “In the inference phase, we can sample new molecules by tuning the latent vector which is drawn from pψ(z) and by specifying the desired ̃ property yc.”).
	
Regarding claim 27 (currently amended): 
	Hong in view of Gao teaches the method of claim 24. 
	Hong further teaches wherein generating a real image based on the new object having the desired property, wherein the desired property includes descriptions having sequences of natural language words (see fig. 1: “Architecture of CARAE for molecular generations. The encoder embeds the SMILES representation of the molecular structure x to the latent vector z, and the decoder reconstructs the molecular structures from the latent vector”. Also see pg. 34 section ‘Implementation Details’: “The LSTM layer of the encoder reads sequential SMILES strings and transforms them to the latent vectors”); and 
	validating the real image to have the desired property (see fig. 1: “In the inference phase, we can sample new molecules by tuning the latent vector which is drawn from pψ(z) and by specifying the desired ̃ property yc.”).

Regarding claim 28 (currently amended): 
	Hong teaches the method of claim 24.
	Hong further teaches inputting sample data of a normal distribution into the generator of the G2S model (see fig. 1 showing a sample data of a normal distribution into the generator of the G2S model);
	conditioning latent vector data in the latent space with at least one desired property of the object based on concatenation of a real-valued property vector with the latent vector (see fig. 1 showing the specific desired property yc being incorporated with the latent vector data. Also see fig.1: “As a result of the adversarial training, two distributions pθ(z|x) and pψ(z̃) become equivalent.”);
	 inputting conditioned latent vector data into the decoder (see fig. 1 showing the specific desired property yc being incorporated with the latent vector data to be inputted into the decoder ); and 
	generating sequence data of a generated object having the at least one desired property by executing the decoder on a computing system to produce a machine-readable sequence representation (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables.”. Also see pg. 33 section ‘Results and Discussion’: “We investigated how accurate the target properties of molecules generated by CARAE are. The conditional generator produced 10000molecules with a given target property. Figure4comparesthenormalizedfrequencyof the condition ally generated molecules for each property and the natural population of the ZINC set molecules; the molecular properties for both cases were computed by RDKit.” ([Examiner note: i.e., emphasis added. RDKit is an open-source collection of cheminformatics and machine-learning software written in C++ and Python. Hong would have had to use a computing system to utilize the RDKit.)]).

Regarding claim 29:
	Hong teaches the method of claim 28.
	Hong further teaches wherein the normal distribution is a normal distribution of real objects having the at least one desired property (see pg. 32 section ‘Results and Discussion’: “Diversity: (1.0 similarity N ij i ij) 1 , − ∑ > for all N molecule pairs (i, j > i) in the test set. The similarity between two molecules was computed with Tanimoto similarity40 between their Morgan fingerprints41 with radius of 4 and 2048 bits.”).

Claim 2 is rejected under 35 U.S.C 103 as being unpatentable over Hong et al. (“Molecular Generative Model Based on an Adversarially Regularized Autoencoder” hereinafter, Hong) in view of Gao et al. (“DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health Forums” hereinafter, Gao) in further view of Hamilton (“Yes, No, Maybe So:  Tips and Tricks for Using 0/1 Binary Variables” hereinafter, Hamilton).
Regarding claim 2:
	Hong in view of Gao teaches the method of claim 1. 
	Hong further teaches obtaining graph data for a plurality of real objects (see pg. 32 section ‘Results and Discussion’: “To train and test our model, we used the QM9 and ZINC datasets. The QM9 set contains 133 885 small organic molecules with up to nine heavy atoms.”);
	inputting the graph data into an encoder (see fig. 1 showing the input of molecule data into the encoder of the G2S model);
	generating latent data having latent vectors in a latent space from the graph data with the encoder (see fig. 1 showing the latent vector space);
	obtaining property data of the real objects (see pg. 31 section ‘Methods’: “The above equation states that molecules having desired properties can be generated by sampling latent vectors from an approximate posterior pψ(z) and marginalizing the product between pψ(z) and the conditional distribution pϕ(x′|z,yc), where yc denotes a condition vector delivering target property information. We add the target property information, yc, to the latent vector z in the decoding process. In the encoding phase, therefore, we need to construct the latent vector z by removing a property-associated attribute which we want to substitute with yc as a condition in the decoding phase”);
	concatenating the latent vectors from the graph data with the property data in the latent space (see fig.1: “In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties.”);
	 inputting latent space data into a decoder (see fig.1 showing latent space data going into a decoder);
	 generating sequence data from the latent space data with the decoder, wherein the sequence data represents real objects and includes symbol logits (see pg. 31 section ‘Previous Works’ subsection ‘Adversarially Regularized Autoencoder’: “The decoder network parameterized by ϕ reconstructs the input from the latent variable drawn from the posterior.”. Also see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables.”. Also see fig. 1: “Figure 1. Architecture of CARAE for molecular generations. The encoder embeds the SMILES representation of the molecular structure x to the latent vector z, and the decoder reconstructs the molecular structures from the latent vector”) [(Examiner’s note: i.e., emphasis added. A person having ordinary skill in the art could understand logits (every instance of this term in the claims will be held under this interpretation) to mean the raw unnormalized direct outputs of the neural network before they have gone through an activation function)];
;
	computing a log-likelihood between the symbol logits of the sequence data and sequence data of the obtained graph data (see pg. 30: “The minimization objective of VAEs is given by 

    PNG
    media_image1.png
    37
    328
    media_image1.png
    Greyscale

where 

    PNG
    media_image2.png
    23
    179
    media_image2.png
    Greyscale

is the reconstruction loss and d KL(qθ(z|x)||p(z)) is the Kullback−Leibler (KL) divergence between the variational distribution qθ(z|x) and the prior distribution p(z).”);
	inputting latent space data into a discriminator (see fig.1 showing the latent space distribution of pθ(z|x) being inputted into the discriminator);
	 generating discriminator output data from the discriminator, wherein the discriminator output data includes discriminator logits see fig. 1 where the discriminator outputs raw unnormalized data that eventually goes through a LReLU function (table 4))
[(Examiner’s note: i.e., emphasis added. A person having ordinary skill in the art could understand logits to mean the raw unnormalized direct outputs of the neural network before they have gone through an activation function)];
	 computing a log-likelihood of the discriminator logits and labels "(see pg. 30: “The minimization objective of VAEs is given by 

    PNG
    media_image1.png
    37
    328
    media_image1.png
    Greyscale

where 

    PNG
    media_image2.png
    23
    179
    media_image2.png
    Greyscale

is the reconstruction loss and d KL(qθ(z|x)||p(z)) is the Kullback−Leibler (KL) divergence between the variational distribution qθ(z|x) and the prior distribution p(z).”. Also see fig. 1 which shows the output of the discriminator being either real or fake data [sic – a binary output]);)
	performing a gradient step for the encoder and decoder (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ . Then, the distributions of these two variables become as similar as possible by minimizing the first and the second term of eq 8 with gradient descent optimization”); and 
	reporting a trained G2S model (see fig. 2: “Convergence of the four evaluation metrics for the ARAE model trained with the QM9 dataset.”).
Hong does not explicitly teach wherein labels can be labeled as 1.
Hamilton, however, teaches analogously wherein labels can be labeled as 1 (see table 2 which shows total number of “yes’s as total number of “1”s).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong and Hamilton before him or her, to modify the method of claim 2 to include attributes of labels being labeled with “1”, a binary unit, in order to avoid mistakes in value assignment (see Hamilton pg. 1 subsection ‘Why Use 0/1 Instead of Y/N’: “But character variables containing alphabetic values are notoriously susceptible to mistakes in value assignment (remember those ‘y’ and ‘n’ values when the database codebook indicated all ‘Y’s and ‘N’s?). Limiting the valid values to 0 or 1 in a numeric field sidesteps some of the problems associated with ‘Y’/’N’ coding such as that described above.”). 


Claims 4, 6, and 11-15 are rejected under 35 U.S.C 103 as being unpatentable over Hong et al. (“Molecular Generative Model Based on an Adversarially Regularized Autoencoder” hereinafter, Hong) in view of Gao et al (“DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health Forums” hereinafter, Gao) in further view of Hamilton (“Yes, No, Maybe So:  Tips and Tricks for Using 0/1 Binary Variables” hereinafter, Hamilton) in further view of Odena et al. (“Is Generator Conditioning Causally Related to GAN Performance?” hereinafter, Odena).
Regarding claim 4 (Currently Amended): 
	Hong in view of Gao in further view of Hamilton teaches the method of claim 2. 
	Hong further teaches obtaining sample samples (see fig. 1 showing the normal distribution ‘s’ being input into the generator of the G2S model)
	inputting the sample data into a generator (see fig. 1 showing the normal distribution ‘s’ being input into the generator of the G2S model);
	 generating sample latent vectors with the generator, wherein the sample latent vectors are in the latent space (see fig. 1 showing the latent vector space);
	 concatenating the property data with the sample latent vectors (see fig.1: “In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties.”);
	 inputting latent space data into the discriminator to obtain discriminator sample data having sample logits (see fig. 1 where latent space data is inputted into the discriminator having sample logits);
	 computing a log-likelihood of the discriminator output logits (see pg. 30: “The minimization objective of VAEs is given by 

    PNG
    media_image1.png
    37
    328
    media_image1.png
    Greyscale

where 

    PNG
    media_image2.png
    23
    179
    media_image2.png
    Greyscale

is the reconstruction loss and d KL(qθ(z|x)||p(z)) is the Kullback−Leibler (KL) divergence between the variational distribution qθ(z|x) and the prior distribution p(z).”) and 
labels "
(see fig. 1 which shows the output of the discriminator being either real or fake data [sic – a binary output])
performing a gradient descent step for the generator (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ . Then, the distributions of these two variables become as similar as possible by minimizing the first and the second term of eq 8 with gradient descent optimization”); and 
reporting a generator trained G2S model (see fig. 2: “Convergence of the four evaluation metrics for the ARAE model trained with the QM9 dataset.”). 
Hong does not explicitly teach wherein labels can be labeled as 1.
Hamilton, however, teaches analogously wherein labels can be labeled as 1 (see table 2 which shows total number of “yes’s as total number of “1”s).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong and Hamilton before him or her, to modify the method of claim 4 to include attributes of labels being labeled with “1”, a binary unit, in order to avoid mistakes in value assignment (see Hamilton pg. 1 subsection ‘Why Use 0/1 Instead of Y/N’: “But character variables containing alphabetic values are notoriously susceptible to mistakes in value assignment (remember those ‘y’ and ‘n’ values when the database codebook indicated all ‘Y’s and ‘N’s?). Limiting the valid values to 0 or 1 in a numeric field sidesteps some of the problems associated with ‘Y’/’N’ coding such as that described above.”). 
Hong does not explicitly teach computing a Jacobian clamping term for the generator.
Odena, however, teaches analogously computing a Jacobian clamping term for the generator (see pg. 6 section 4: “Jacobian Clamping directly controls the condition number of Mz. We show (across 3 standard datasets) that when we implement Jacobian Clamping, the condition number of the generator is decreased, and there is a corresponding improvement in the quality of the scores. This is evidence in favor of the hypothesis that ill-conditioning of Mz “causes” bad scores … Specifically, we train the same models as from the previous section using Jacobian Clamping with a λmax of 20, a λmin of 1, and _ of 1 and hold everything else the same. As in the previous section, we conducted 10 training runs for each dataset. Broadly speaking, the effect of Jacobian Clamping was to prevent the GANs from falling into the ill conditioned cluster. This improved the average case performance, but didn’t improve the best case performance.”). 
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong, Gao, Hamilton, and Odena before him or her, to modify the method of claim 4 to include attributes of Jacobian clamping term for the generator in order to improve average case performance (see Odena at pg. 6 section 4 ‘Jacobian Clamping’ subsection ‘Jacobian Clamping Improves Mean Score and Reduces Variance of Scores’: “Broadly speaking, the effect of Jacobian Clamping was to prevent the GANs from falling into the ill-conditioned cluster. This improved the average case performance, but didn’t improve the best case performance.”). 

Regarding claim 6:
	Hong in view of Gao in further view of Hamilton in further view of Odena teaches the method of claim 4.
	Hong further teaches computing a log-likelihood of the discriminator output logits (see pg. 30: “The minimization objective of VAEs is given by 

    PNG
    media_image1.png
    37
    328
    media_image1.png
    Greyscale

where 

    PNG
    media_image2.png
    23
    179
    media_image2.png
    Greyscale

is the reconstruction loss and d KL(qθ(z|x)||p(z)) is the Kullback−Leibler (KL) divergence between the variational distribution qθ(z|x) and the prior distribution p(z).”) and 
	labels “(see fig. 1 which shows the output of the discriminator being either real or fake data [sic – a binary output]);
	performing a gradient descent step for the discriminator using outcome from the log-likelihood of the discriminator logits (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ . Then, the distributions of these two variables become as similar as possible by minimizing the first and the second term of eq 8 with gradient descent optimization.”. Also see eq. 8: “
    PNG
    media_image3.png
    74
    335
    media_image3.png
    Greyscale
”.) and 
	labels "(see fig. 1 which shows the output of the discriminator being either real or fake data [sic – a binary output]); and 
	reporting a discriminator trained G2S model (see fig. 2: “Convergence of the four evaluation metrics for the ARAE model trained with the QM9 dataset.”)
	Hong does not explicitly teach wherein labels can be labeled as 1.
Hamilton, however, teaches analogously wherein labels can be labeled as 1 (see table 2 which shows total number of “yes’s as total number of “1”s) and wherein labels can be labeled 0 (see table 2 which shows total number of “no’s as total number of “0”s).
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong, Gao, Hamilton, and Odena before him or her, to modify the method of claim 6 to include attributes of labels being labeled with “1”, a binary unit, and “0”, a binary unit, in order to avoid mistakes in value assignment (see Hamilton at pg. 1 subsection ‘Why Use 0/1 Instead of Y/N’: “But character variables containing alphabetic values are notoriously susceptible to mistakes in value assignment (remember those ‘y’ and ‘n’ values when the database codebook indicated all ‘Y’s and ‘N’s?). Limiting the valid values to 0 or 1 in a numeric field sidesteps some of the problems associated with ‘Y’/’N’ coding such as that described above.”). 

Regarding claim 11:
	Hong in view of Gao in further view Hamilton in further view of Odena teaches the method of claim 6.
	Hong further teaches comprising an optimization protocol that includes a reinforcement learning protocol, comprising (see pg. 35 section ‘Implementation Details’: “We trained the Bayesian graph convolutional network with the number of graph convolution layers of four, a dropout rate of 0.2, and a weight decay coefficient of 10−6 with the EGFR subset of the DUD-E dataset. We split the entire subset into 80:20 for the training and test. In the training phase, we used a batch size of 128 and a training epoch of 50 and a learning rate of 0.1 at 20 and 40 epochs. In the inference phase, we obtained the predictive distribution with the number of Monte Carlo sampling of 50 for Bayesian inference.”):
a) inputting sample data for a normal distribution into the generator (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ .”);
	b) obtaining  sample latent vectors with the generator (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ .”);
	c) obtaining generated objects using the decoder (see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables. The generator produces new samples by taking random variables from a distribution (0, )I 2 5 σ .”);
	d) calculating properties of the generated objects, the calculated properties having desired properties (see pg. 32 section ‘Methods’: “In the training phase, the decoder reconstructs input molecular structures from the latent vector and property information of input molecules. In the inference phase, we can sample new molecules by tuning the latent vector which is drawn from pψ(z) and by specifying the desired property yc.”);
	e) when the calculated properties of a sub-set of generated objects are sufficiently close to the desired properties, the parameters of the generator and decoder change to provide an improved latent manifold of the latent space, the improved latent manifold having desired objects with the desired properties (see fig. 1: “As a result of the adversarial training, two distributions pθ(z|x) and pψ(z) become ̃ equivalent. The predictor is trained to predict an original molecular property y and separate this information from the latent vector by minimizing the mutual information term in eq 8. In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties”);
	f) repeating steps a) through e) until convergence (see fig. 1: “As a result of the adversarial training, two distributions pθ(z|x) and pψ(z) become ̃ equivalent.”); and 
	g) providing at least one object having the desired properties (see fig. 1: “In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties”).
	
Regarding claim 12:
	Hong in view of Gao in further view of Hamilton in further view of Odena teaches the method of claim 11.
	Hong further teaches wherein the desired properties are selected from solubility, lipophilicity, quantitative estimation of drug likeness, Tanimoto similarity with a target molecule, or combinations thereof (see pg. 32 section ‘Results and Discussion’: “The similarity between two molecules was computed with Tanimoto similarity40 between their Morgan fingerprints41 with radius of 4 and 2048 bits.”). 

Regarding claim 13:
Hong in view of Gao in further view of Hamilton in further view of Odena teaches the method of claim 6.
Hong further teaches comprising an optimization protocol that includes a Bayesian optimization protocol on the latent space, comprising (see pg. 35 section ‘Implementation Details’: “We trained the Bayesian graph convolutional network with the number of graph convolution layers of four, a dropout rate of 0.2, and a weight decay coefficient of 10−6 with the EGFR subset of the DUD-E dataset. We split the entire subset into 80:20 for the training and test. In the training phase, we used a batch size of 128 and a training epoch of 50 and a learning rate of 0.1 at 20 and 40 epochs. In the inference phase, we obtained the predictive distribution with the number of Monte Carlo sampling of 50 for Bayesian inference.”): 
	a) providing the G2S model (see fig. 1 which shows an adversarial autoencoder architecture that takes a graph [sic – molecule] and creates a sequence [sic – SMILE].  Also see pg. 31 section ‘Methods’: “Figure 1 illustrates the architecture of the CARAE model for molecular generations. SMILES sequences are transformed by the encoder into latent variables”. )
	b) obtaining a batch of points from an identified area in the latent space, the identified area having latent vectors of the objects with the desired properties (see fig.3: “Molecules reconstructed from the latent vectors that appeared in the interpolation among two latent vectors. The starting and ending points are the latent vectors of aspirin and Tamiflu, respectively.”);
	c) generating objects with the decoder (see fig. 1 which shows the decoder generating molecules);
	d) calculating properties of the decoder-generated objects (see fig. 1: “In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties”);
	e) updating the G2S model with batch of points from step b) and calculated properties from step d) (see pg. 35 section ‘Implementation Details’: “We split the entire subset into 80:20 for the training and test. In the training phase, we used a batch size of 128 and a training epoch of 50 and a learning rate of 0.1 at 20 and 40 epochs.”. Also see pg. 31 section ‘Previous Works’ subsection ‘Adversarially Regularized Autoencoder’: “As ARAE aims to estimate the posterior distribution by generating a distribution similar to that of the generator, the training objective is given by 

    PNG
    media_image4.png
    161
    334
    media_image4.png
    Greyscale

With the 1-Lipschitz continuity ||fw|| ≤ 1.”. );
	f) repeating steps a) through e) until convergence (see pg. 31 section ‘Previous Works’ subsection ‘Adversarially Regularized Autoencoder’: “As a result of training, the two distributions pθ and pψ become identical, and we can generate new samples by using random variables sampled from pψ as an input to the decoder.”); and 
	g) providing at least one object having the desired properties (see fig. 1: “In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties”).

Regarding claim 14:
Hong in view of Gao in further view of Hamilton in further view of Odena teaches the method of claim 6.
Hong further teaches performing a generative topographic mapping protocol, comprising (see fig. 1: “Architecture of CARAE for molecular generations.”. Also see pg. 32 section ‘Results and Discussion’: “Also, we note the abbreviations of three molecular properties used for conditional generations: octanol− water partition coefficient (log P), topological polar surface area (TPSA), and synthetic accessibility score (SAS)”): 
a) obtaining a set of objects having desired properties (pg. 30 section ‘Introduction’: “We also introduced a conditional ARAE to generate molecules with desired properties”);
b) obtaining latent vectors of the set of objects with the encoder (see fig. 1 showing latent vectors deriving from the encoder)
c) translating the latent vectors of  the set of objects into a 2D map with the properties identified on the 2D map (see fig. 1 showing latent vectors deriving from the encoder from the 2D [sic – a molecular structure])
d) selecting at least one region of the 2D map having the desired properties (see fig. 1 where at least one region of a 2D map [sic – a molecular structure] is selected for input into an encoder);
e) translating the at least one region into a G2S latent space (see pg. 34 section ‘Implementation Details’: “The LSTM layer of the encoder reads sequential SMILES strings and transforms them to the latent vectors”);
f) generating objects using the decoder (see pg. 31 section ‘Previous Works’ subsection ‘Adversarially Regularized Autoencoder’: “As a result of training, the two distributions pθ and pψ become identical, and we can generate new samples by using random variables sampled from pψ as an input to the decoder.”);
g) calculating properties of the generated objects (see pg. 30 section ‘Introduction’: “We also introduced a conditional ARAE to generate molecules with desired properties. Specifically, we adopted a variational mutual information minimization framework to manipulate latent variables with target properties. As a result, we could produce unseen molecules having designated molecular properties with a high success rate.”); 
h) updating the 2D map with objects generated by the decoder and with calculated properties from step g) (see fig. 1: “In the decoding phase, the specified property information yc is incorporated together with the latent vector to generate the molecules with specific desired properties.”);
i) repeating steps b) through h) until obtaining at least one object with the desired properties (see fig. 1: “As a result of the adversarial training, two distributions pθ(z|x) and pψ(z) become equivalent”); and 
j) reporting the at least one object with the desired properties (see fig.2 which shows reporting of the desired properties). 

Regarding claim 15:
Hong in view of Gao in further view of Hamilton in further view of Odena teaches the method of claim 14.
Hong further teaches training the G2S model with the set of objects having the desired properties (see pg. 32 section ‘Methods’ : “In the training phase, the decoder reconstructs input molecular structures from the latent vector and property information of input molecules. In the inference phase, we can sample new molecules by tuning the latent vector which is drawn from pψ(z) and by specifying the desired property yc.”) and
and repeating steps b) through h) until obtaining at least one object with the desired properties (see pg. 31 section ‘Previous Works’ subsection ‘Adversarially Regularized Autoencoder’: “As ARAE aims to estimate the posterior distribution by generating a distribution similar to that of the generator, the training objective is given by 

    PNG
    media_image4.png
    161
    334
    media_image4.png
    Greyscale

with the 1-Lipschitz continuity ||fw|| ≤ 1.”. ); and 
reporting the at least one object with the desired properties (see fig. 2: “Convergence of the four evaluation metrics for the ARAE model trained with the QM9 dataset.”).

Claim 7 is rejected under 35 U.S.C 103 as being unpatentable over Hong et al. (“Molecular Generative Model Based on an Adversarially Regularized Autoencoder” hereinafter, Hong) in view of Gao et al (“DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health Forums” hereinafter, Gao) in further view of De Cao et al. (“MolGAN: An implicit generative model for small molecular graphs” hereinafter, De Cao).
Regarding claim 7:
	Hong in view of Gao teaches the method of claim 5.
	Hong further teaches performing at least one iteration of the autoencoder step, generator step, and discriminator step (see fig. 4: “Distributions of three different molecular properties(A) log P, (B) SAS, and (C) TPSA — when molecules are generated by specifying a target property value denoted in the legends. Note that the curves labeled with ZINC denote the distribution of each molecular property in the ZINC dataset.”).
	Hong does not explicitly teach decreasing a learning rate for the autoencoder step. 
	De Cao, however, teaches analogously decreasing a learning rate for the autoencoder step (see pg. 6 section 5 subsection ‘Training’: “In all experiments, we use a batch size of 32 and train using the Adam (Kingma & Ba, 2015) optimizer with a learning rate of 10^−3”) [(Examiner’s note: The Adam optimizer controls decreasing learning rates using the learning rate formula of (α/(1 –                         
                            
                                
                                    β
                                
                                
                                    1
                                
                                
                                    t
                                
                            
                        
                    )) where β1, β2 ∈ [0, 1) as seen on pg. 2 algorithm 1 of  Kingma et al. “Adam: A Method for Stochastic Optimization”)]; 
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong, Gao, and De Cao before him or her, to modify the method of claim 7 to include attributes of decreasing a learning rate for the autoencoder step in order to improve convergence with the best model found (see De Cao pg. 6 section 5.0 subsection ‘Training’: “In all experiments, we use a batch size of 32 and train using the Adam (Kingma & Ba, 2015) optimizer with a learning rate of 10−3 . For each setting, we employ a grid search over dropout rates ∈ {0.0, 0.1, 0.25} (Srivastava et al., 2014) as well as over discretization variations (as described in Section 3.1). We always report the results of the best model depending on what we are optimizing for (e.g., when optimizing solubility we report the model with the highest solubility score – when no metric is optimized we report the model with the highest sum of individual scores)”)

Claim 16 and 21 are rejected under 35 U.S.C 103 as being unpatentable over Hong et al. (“Molecular Generative Model Based on an Adversarially Regularized Autoencoder” hereinafter, Hong) in view Gao et al. (“DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health Forums” hereinafter, Gao) in further view of Armitage et al. (“Fragment Graphical variational AutoEncoding for Screening Molecules with Small Data” hereinafter, Armitage).
Regarding claim 16:
	Hong in view of Gao teaches the method of claim 1. 
	Hong does not explicitly teach obtaining scaffold data, the scaffold includes structural data for  at least a portion of a molecule, inputting the scaffold data into a scaffold encoder, generating scaffold latent vectors in the latent space, wherein objects generated by the decoder are conditioned on the structural data, and have at a structure of the at least a portion of the molecule.
	Armitage, however, teaches analogously obtaining scaffold data, the scaffold data includes structural data for at least a portion of a molecule (see fig. 2 which shows the starting molecule, Ibuprofen, being decomposed into scaffold data) [(Examiner’s note: i.e., emphasis added. Scaffold data, according to instant case’s para [085] which gives scaffold data as being an example of fragment data)];
	inputting the scaffold data into a scaffold encoder (see fig. 2 which shows an encoder that is only taken scaffold data as input ); and 
	generating scaffold latent vectors in the latent space, wherein objects generated by the decoder are conditioned on the structural data, and have at a structure of the at least a portion of the molecule (see fig. 2: “FraGVAE autoencoder overview: The graph is decomposed into a bag of fragments, encoded to a latent space (𝑍𝐹) and then decoded to reproduce the fragment bag. Secondly the connectivity of fragments is encoded to a latent space (𝑍𝐶) and, using the bag of fragments and 𝑍𝐶, t”). 
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong, Gao, and Armitage before him or her, to modify the method of claim 16 to include attributes scaffold molecular data as input to an encoder in order to reduce error in predictions (see Armitage abstract: “We demonstrate that fragment-based graphical autoencoding reduces the error in predicting physical characteristics such as the solubility and partition coefficient in the small data regime compared to other extended circular fingerprints and string based approaches.”.)

Regarding claim 21:
	Hong in view of Gao teaches the method of claim 5. 
	Hong does not explicitly teach a separate machine learning model configured to parameterize a desired distribution of latent vectors of objects having a same value of a desired property, wherein the separate machine learning model is a neural network, Gaussian process, or graph neural network, when the graph neural network’s desired properties are a molecular scaffold or fragment thereof.
	Armitage, however, teaches analogously a separate machine learning model configured to parameterize a desired distribution of latent vectors of objects having a same value of a desired property (see fig. 2 where the separate machine learning model is in the form of an encoder specifically for decomposed fragments by parameterizing a distribution, Zf, of latent vectors),
	 wherein the separate machine learning model is a neural network, Gaussian process, or graph neural network, when the graph neural network the desired properties are a molecular scaffold or fragment thereof (see fig. 2 where the separate machine learning model is in the form of an encoder specifically for decomposed fragments).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong, Gao, and Armitage before him or her, to modify the method of claim 21 to include attributes of a separate machine learning model dedicated to scaffold or fragment data in order to help the autoencoder be robust to overfitting even with small number of training examples  (see Armitage pg. 3-4 subsection ‘Fragment Graphical Autoencoding’: “Training a network to autoencode a molecular graph with N unique fragments which can connect to every other fragment results in N! training examples to autoencode a single molecular structure. This property helps the autoencoder to be robust to overfitting even with a small number of training examples. In contrast a similar string based approach would have 1 training example.”)
	
Claim 22 is rejected under 35 U.S.C 103 as being unpatentable over Hong et al. (“Molecular Generative Model Based on an Adversarially Regularized Autoencoder” hereinafter, Hong) in view of Gao et al (“DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health Forums” hereinafter, Gao) in further view of Ghiandoni (“Enhancing Reaction-based de novo Design using Machine Learning” hereinafter, Ghiandoni).
Regarding claim 22:
	Hong in view of Gao teaches the method of claim 5. 
	Hong does not explicitly teach wherein the graph data includes condensed graphs of chemical reactions and the sequence data generated by the decoder is SMIRKS data, and wherein the object properties are a type of reaction or a catalyst for the type of reaction.
	Ghiandoni, however, teaches analogously wherein the graph data includes condensed graphs of chemical reactions and the sequence data generated by the decoder is SMIRKS data (see pg. 10 section 1.3 : “SMILES strings do not allow the specification of substructure queries; however, reaction queries can be generated using an extension of the SMILES named SMIRKS (Simple Molecular Input Reaction Kinetic String), which is typically used in database searching (see Section 1.5)”. Also see pg. 38 section 2.7: “Autoencoders (AEs) are neural network-based (NN-based) architectures for unsupervised feature representation learning, which consists of three components: an encoder, a decoder, and a distance function. The encoder converts the input data (e.g. a SMILES string) into a representation with lower dimensionality (a continuous vector), then the decoder attempts the reconstruction of the original input from the low dimensional representation.”) and 
	wherein the object properties are a type of reaction or a catalyst for the type of reaction (see pg. 10 section 1.3 : “SMILES strings do not allow the specification of substructure queries; however, reaction queries can be generated using an extension of the SMILES named SMIRKS (Simple Molecular Input Reaction Kinetic String), which is typically used in database searching (see Section 1.5)”). 
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Hong, Gao, Armitage before him or her, to modify the method of claim 22 to include attributes of SMIRKS data and object properties being a type of reaction or a catalyst for the type of reaction in order to represent the specification of substructure queries (see Ghiandoni at pg. 10 section 1.3: “SMILES strings do not allow the specification of substructure queries; however, reaction queries can be generated using an extension of the SMILES named SMIRKS (Simple Molecular Input Reaction Kinetic String), which is typically used in database searching (see Section 1.5)”). 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew A Bracero whose telephone number is (571)270-0592. The examiner can normally be reached Monday - Friday 9:00a.m. - 5:00 p.m. ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at Monday - Friday 9:00a.m. - 5:00 p.m. ET at 571-273-8300.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW BRACERO/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Aug 16, 2022
Application Filed
Nov 10, 2025
Non-Final Rejection mailed — §102, §103, §112
Mar 09, 2026
Response Filed
Mar 31, 2026
Final Rejection mailed — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/655,489
Patent 12619870
PROGRAMMABLE NON-LINEAR ACTIVATION ENGINE FOR NEURAL NETWORK ACCELERATION
4y 1m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
100%
Grant Probability
99%
With Interview (+0.0%)
4y 6m (~9m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 9 resolved cases by this examiner. Grant probability derived from career allowance rate.