DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Applicant’s election without traverse of Group I, claims 1, 2, 4, 6, 11, 13, 18, 20-21, 24-26, 29, and 31-32, in the reply filed is acknowledged. However, it is noted that after further consideration that there is not a search and/or examination burden between the different groups of inventions. Therefore, the restriction requirement mailed 9/11/2025 is withdrawn. In view of the withdrawal of the restriction requirement as to the rejoined inventions, applicant(s) are advised that if any claim presented in a divisional application is anticipated by, or includes all the limitations of, a claim that is allowable in the present application, such claim may be subject to provisional statutory and/or nonstatutory double patenting rejections over the claims of the instant application.
Once the restriction requirement is withdrawn, the provisions of 35 U.S.C. 121 are no longer applicable. See In re Ziegler, 443 F.2d 1211, 1215, 170 USPQ 129, 131-32 (CCPA 1971). See also MPEP 804.01.
Claim Status
Claims 1-2, 4, 6, 11, 13, 18, 20-21, 24-26, 29, 31-32, 41-42, 44, 46, and 94 are pending.
Claims 1-2, 4, 6, 11, 13, 18, 20-21, 24-26, 41-42, 44, 46, and 94 are rejected.
Claims 13,18, and 24 are objected to.
Priority
The instant application claims benefit of priority to U.S. Provisional Application No. 62/900,420 filed on 9/13/2019. The claim to the benefit of priority is acknowledged. As such, the effective filing date of claims 1-2, 4, 6, 11, 13, 18, 20-21, 24-26, 29, 31-32, 41-42, 46, and 94 is 9/13/2019.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 7/6/2023, 12/23/2024, 8/28/2025, and 10/31/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Nucleotide and/or Amino Acid Sequence Disclosures
REQUIREMENTS FOR PATENT APPLICATIONS CONTAINING NUCLEOTIDE AND/OR AMINO ACID SEQUENCE DISCLOSURES
Items 1) and 2) provide general guidance related to requirements for sequence disclosures.
37 CFR 1.821(c) requires that patent applications which contain disclosures of nucleotide and/or amino acid sequences that fall within the definitions of 37 CFR 1.821(a) must contain a "Sequence Listing," as a separate part of the disclosure, which presents the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of 37 CFR 1.821 - 1.825. This "Sequence Listing" part of the disclosure may be submitted:
In accordance with 37 CFR 1.821(c)(1) via the USPTO patent electronic filing system (see Section I.1 of the Legal Framework for Patent Electronic System (https://www.uspto.gov/PatentLegalFramework), hereinafter "Legal Framework") as an ASCII text file, together with an incorporation-by-reference of the material in the ASCII text file in a separate paragraph of the specification as required by 37 CFR 1.823(b)(1) identifying:
the name of the ASCII text file;
ii) the date of creation; and
iii) the size of the ASCII text file in bytes;
In accordance with 37 CFR 1.821(c)(1) on read-only optical disc(s) as permitted by 37 CFR 1.52(e)(1)(ii), labeled according to 37 CFR 1.52(e)(5), with an incorporation-by-reference of the material in the ASCII text file according to 37 CFR 1.52(e)(8) and 37 CFR 1.823(b)(1) in a separate paragraph of the specification identifying:
the name of the ASCII text file;
the date of creation; and
the size of the ASCII text file in bytes;
In accordance with 37 CFR 1.821(c)(2) via the USPTO patent electronic filing system as a PDF file (not recommended); or
In accordance with 37 CFR 1.821(c)(3) on physical sheets of paper (not recommended).
When a “Sequence Listing” has been submitted as a PDF file as in 1(c) above (37 CFR 1.821(c)(2)) or on physical sheets of paper as in 1(d) above (37 CFR 1.821(c)(3)), 37 CFR 1.821(e)(1) requires a computer readable form (CRF) of the “Sequence Listing” in accordance with the requirements of 37 CFR 1.824.
If the "Sequence Listing" required by 37 CFR 1.821(c) is filed via the USPTO patent electronic filing system as a PDF, then 37 CFR 1.821(e)(1)(ii) or 1.821(e)(2)(ii) requires submission of a statement that the "Sequence Listing" content of the PDF copy and the CRF copy (the ASCII text file copy) are identical.
If the "Sequence Listing" required by 37 CFR 1.821(c) is filed on paper or read-only optical disc, then 37 CFR 1.821(e)(1)(ii) or 1.821(e)(2)(ii) requires submission of a statement that the "Sequence Listing" content of the paper or read-only optical disc copy and the CRF are identical.
Specific deficiencies and the required response to this Office Action are as follows:
Specific deficiency – Nucleotide and/or amino acid sequences appearing in the drawings are not identified by sequence identifiers in accordance with 37 CFR 1.821(d). Sequence identifiers for nucleotide and/or amino acid sequences must appear either in the drawings or in the Brief Description of the Drawings. More specifically, figures 1C-D include sequences but there are no SEQ ID Nos in the drawings nor in the brief description of the drawings.
Required response – Applicant must provide:
Replacement and annotated drawings in accordance with 37 CFR 1.121(d) inserting the required sequence identifiers;
AND/OR
A substitute specification in compliance with 37 CFR 1.52, 1.121(b)(3) and 1.125 inserting the required sequence identifiers into the Brief Description of the Drawings, consisting of:
A copy of the previously-submitted specification, with deletions shown with strikethrough or brackets and insertions shown with underlining (marked-up version);
A copy of the amended specification without markings (clean version); and
A statement that the substitute specification contains no new matter.
Claim Objections
Claims 13 and 18 are objected to because of the following informalities: the use of an acronym should also include the phrase for which it describes. Appropriate correction is required.
Claim 24 objected to because of the following informalities: “that or more likely” in lines 3-4 should be “that are more likely”. Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4, 20, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Saraf et al. (Biophysics Journal (2006) 4167-4180), and Wang et al. (Scientific Reports (2018) 1-9).
Claim 1 is directed to a method of designing proteins that have specific desired functionality using machine learning, gene synthesis and assays to measure the specific functionality.
Saraf et al. teaches in the abstract “In this article, we introduce the computational procedure IPRO (iterative protein redesign and optimization procedure) for the redesign of an entire combinatorial protein library in one step using energy-based scoring functions. IPRO relies on identifying mutations in the parental sequences, which when propagated downstream in the combinatorial library, improve the average quality of the library (e.g., stability, binding affinity, specific activity, etc.)”, on pages 4168-4169, columns 2 and 1 respectively, paragraphs 3 and 1 respectively “we describe in detail the IPRO procedure and introduce the globally convergent mixed-integer linear program that drives residue redesign. We also discuss the methods used for generating and identifying hybrid Escherichia coli/Baccilus subtilis dihydrofolate reductase (DHFR) and B. subtilis/Lactobacillus casei DHFR enzymes containing single crossover positions and assays for DHFR activity. Next, we provide an example application of IPRO to highlight the features and type of output obtained with IPRO. The study involves the computational identification of parental redesigns that are likely to improve a single crossover E. coli/B. subtilis DHFR combinatorial library composed of 16 hybrids”, on page 4169, column 1, “a set of residue changes is identified in the parental sequences, which upon propagation among the combinatorial library members, lead to the optimization of the average library score (e.g., binding energy or stability). This optimization step is carried out globally using a MILP model within a local perturbation window, whereas simulated annealing is used to accept or reject the residue redesigns”, on page 4167, column 2, paragraph “Because activity level or other performance objectives are very difficult to compute directly, alternative surrogates of hybrid fitness, such as stability or binding affinity, are employed in most studies”, and in Figure 4 “IPRO is an iterative protein redesign software…”, while this last step is directed to the in silico portion of the method, the integration of protein synthesis within the loop would be obvious as is stated on page 4167, column 1, paragraph 1 “most protein engineering paradigms involve the synthesis and screening of multiple protein candidates (protein library) as a way to enhance the odds of identifying proteins with the desired functionality level”, thereby reading on a method of designing proteins having a desired functionality, performing an iterative loop, wherein each iteration of the loop comprises; synthesizing candidate genes and producing candidate proteins corresponding to the respective candidate amino acid sequences, each of the candidate genes coding for the corresponding candidate amino acid sequence; evaluating a degree to which the candidate proteins respectively exhibit a desired functionality by measuring values indicative of properties of the candidate proteins using one or more assays.
Wang et al. teaches in the abstract “In this study, we applied the deep learning neural
network approach to computational protein design for predicting the probability of 20 natural amino
acids on each residue in a protein. A large set of protein structures was collected and a multi-layer
neural network was constructed. A number of structural properties were extracted as input features”, on page 7, paragraph 4 “Rosetta design was carried out with the fixbb program and talaris2014 score in Rosetta 3.762. The crystal structures of the design targets were used as inputs without any prior minimization. 500 designs were performed for each protein with and without residue-type restraints, which were incorporated using the “-resfile” option”, which in view of the teachings from Saraf et al. reads on determining candidate amino acid sequences of synthetic proteins using a machine- learning model that has been trained to learn implicit patterns in a training dataset amino acid sequences of proteins, the machine-learning model expressing the learned implicit patterns in a trained model, and when one or more stopping criteria of the iterative loop have not been satisfied, calculating, from the measured values, a fitness function assigned to each sequence, and selecting, using a combination of the fitness function together with the machine-learning model, new candidate amino acid sequences for a subsequent iteration.
It would have been obvious at the time of first filing to have modified the teachings of Saraf et al. for the use of an iterative protein design method which synthesizes and assays genes for protein design, with the teachings of Wang et al. for the use of machine learning methods such as their DNN or Rosetta, because as the latter describes on page 1, paragraph 2 “while the number of known protein structures is increasing rapidly, the number of unique protein folds is saturating”, suggesting the need of machine learning algorithms for the learning of implicit patterns within a dataset as the claim suggests. One would have had a reasonable expectation of success given that this would merely be a substitutive method, i.e. while Saraf et al. uses an MILP model to alter sequences and then goes into gene synthesis and assaying based off said changes, one could easily change the MILP for a DNN or Rosetta to generate the gene sequences and then perform the latter steps, particularly as the methods (the DNN and Rosetta) are freely available through their respective software’s. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.
Claim 2 is directed to the method of claim 1 but further specifies that the implicit patters are learned in a latent space using dimension reduction relative to the dimension of the amino acid sequences.
Wang et al. teaches on page 2, paragraph 3 “The input of the computational protein design problem is the backbone structure of a protein (or part of a protein). Instead of predicting the residue types of all positions in the input protein simultaneously, we consider each target residue and its neighbor residues (for simplicity, non-protein residues are not considered). In the simplest case, we consider a target position and its closest neighboring residue determined by Cα-Cα distance, and feed their input features to a neural network that consists of an input layer, several hidden layers and a softmax layer as output. The output dimension of the softmax layer is set to 20 so that the 20 output numbers that sum to one can be interpreted as the probabilities of 20 residue types of the target residue”, reading on wherein the implicit patterns are learned in a latent space, and wherein determining the candidate amino acid sequences further comprises determining the latent space has a reduced dimension relative to a characteristic dimension of the amino acid sequences of the training dataset.
Claim 4 is directed to the method of claim 1 but further specifies the specific dimensions of the dataset matrix.
Saraf et al. teaches in Figure 1(a) “Promising hybrid sequences from the library are selected for downstream redesign that involves either random or site-directed mutagenesis”, reading on wherein the training dataset comprises a multi- sequence alignment of evolutionarily-related proteins, and a characteristic dimension of the amino acid sequences of the training dataset is a product LxK, where L is a length of one of the amino acid sequences of the training dataset times and K is a number of possible types of amino acids.
Claim 20 is directed to the method of claim 1 but further specifies that the selecting of candidates is performed using dimensional reduction to rank the leading components of the candidate and biasing the selecting of candidates.
Wang et al. teaches on page 2, paragraph 3 “The input of the computational protein design problem is the backbone structure of a protein (or part of a protein). Instead of predicting the residue types of all positions in the input protein simultaneously, we consider each target residue and its neighbor residues (for simplicity, non-protein residues are not considered). In the simplest case, we consider a target position and its closest neighboring residue determined by Cα-Cα distance, and feed their input features to a neural network that consists of an input layer, several hidden layers and a softmax layer as output. The output dimension of the softmax layer is set to 20 so that the 20 output numbers that sum to one can be interpreted as the probabilities of 20 residue types of the target residue”, the use of dimensional reduction in the model would inherently rank components and bias the output towards those candidates with optimized components which inherently represent the functionality of the candidate.
Claim 26 is directed to the method of claim 1 but further specifies that the calculation of the fitness function is done via supervised learning of the functionality landscape according to latent space.
Wang et al. teaches on page 2, paragraph 3 “The input of the computational protein design problem is the backbone structure of a protein (or part of a protein). Instead of predicting the residue types of all positions in the input protein simultaneously, we consider each target residue and its neighbor residues (for simplicity, non-protein residues are not considered). In the simplest case, we consider a target position and its closest neighboring residue determined by Cα-Cα distance, and feed their input features to a neural network that consists of an input layer, several hidden layers and a softmax layer as output. The output dimension of the softmax layer is set to 20 so that the 20 output numbers that sum to one can be interpreted as the probabilities of 20 residue types of the target residue”, and on page 4167, column 2, paragraph “Because activity level or other performance objectives are very difficult to compute directly, alternative surrogates of hybrid fitness, such as stability or binding affinity, are employed in most studies”, therefore the learning of a functionality landscape would inherently be based on a latent space if the network were performing dimension reduction along with the use of a fitness function to select sequences, thereby reading on wherein the step of calculating the fitness function further comprises performing supervised learning of a functionality landscape approximating the measured values of the candidate proteins as a function of corresponding positions within the latent space, wherein the fitness function is based, at least in part, on the functionality landscape.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Saraf et al. (Biophysics Journal (2006) 4167-4180), and Wang et al. (Scientific Reports (2018) 1-9) as applied to claims 1-2, 4, 20, and 26 above, and further in view of Fan et al. (Biochimica et biophysica acta. General subjects (2017) 3024-3029).
Claim 6 is directed to the method of claim 4 and thus claim 1, but further specifies that at least one of the amino acids is a non-natural amino acid.
Saraf et al., and Wang et al. teach the method of claims 1-2, 4, 20, 26, 41-42, and 44 as described above.
Fan et al. teaches in the abstract “This review summarizes themajor quality control mechanisms during protein synthesis, including aminoacyl-tRNA synthetases, elongation factors, and the ribosome. We will discuss evolution and engineering of such components that allow incorporation of natural and synthetic amino acids at positions that deviate from the standard genetic code”, reading on wherein at least one of the possible types of amino acids is a non-natural amino acid.
It would have been obvious at the time of first filing to have modified the teachings of Saraf et al. and Wang et al. for the method of claim 1-2, 4, 20, 26, 41-42, and 44, with the teachings of Fan et al. for the incorporation of synthetic amino acids and their design as the latter teaches in the abstract “Expanding the genetic code with synthetic amino acids through rewiring protein synthesis has broad applications in synthetic biology and chemical biology”. One would have had a reasonable expectation of success given that the latter is a review article linking to multiple synthetic amino acid papers within the art, and would merely be an incorporation of a new type of amino acid within the data. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.
Claims 11, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Saraf et al. (Biophysics Journal (2006) 4167-4180), and Wang et al. (Scientific Reports (2018) 1-9) as applied to claims 1-2, 4, 20, 26, 41-42, and 44 above, and further in view of Sinai et al. (arXiv preprint (2017) 1-6).
Claim 11 is directed to the method of claim 2 and thus claim 1, but further specifies the use of an encoding algorithm for the sequences and the use of an optimization function.
Saraf et al., and Wang et al. teach the method of claims 1-2, 4, 20, and 26 as described above.
Wang et al. teaches on page 1, paragraph 1 “the input of computational protein design is the backbone structure of a target protein (or part of a target protein). Through computational sampling and optimization, sequences that are likely to fold into the desired structure are generated for experimental verification. The scoring function usually contains physics-based terms such as van der Waals and electrostatic energy as well as knowledge-based terms such as sidechain rotamer and backbone dihedral preference obtained from statistics of protein structures”.
Sinai et al. teaches in the abstract “we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein”, which in view of the teachings of Saraf et al. and Wang et al., read on wherein the machine-learning model is a network model that performs encoding and decoding/generation, the encoding being performed by mapping an input amino acid sequence to a point in the latent space, and the decoding/generation being performed by mapping the point in the latent space to an output amino acid sequence, and the machine-learning model is trained to optimize an objective function, a component of which represents a degree to which the input amino acid sequence and the output amino acid sequence match, such that, when trained using the training dataset, the machine-learning model generates output amino acid sequences that approximately match the amino acid sequences of the training dataset that are applied as inputs to the machine-learning model, and wherein the machine-learning model comprises a variational auto-encoder (VAE) based artificial neural network (ANN) comprising an encoder configured to learn implicit patterns and correlations among the residues of the respective amino acid sequences in the training dataset to predict amino acid sequences within a latent space, a decoder to generatively design new amino acid sequences from the latent space, and a supervised regression model to optimize the latent space by configuring gradients for identifying protein properties.
It would have been obvious at the time of first filing to have modified the teachings of Saraf et al., and Wang et al. for the use of an optimization function and the method of claims 1-2, 4, 20, and 26 with the teaching of Sinai et al. for the use of a VAE, as the latter teaches in the abstract “This approach generally performs better than baseline methods that consider no interactions within sequences, and in some cases better than the state-of-the-art approaches that use the inverse-Potts model”. One would have had a reasonable expectation of success given that the use of a VAE does not preclude the use of other machine-learning or NN models and is specifically implemented within the latter as a computational guide, the exact reason specified within the claim. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.
Claim 24 is directed to the method of claim 11 and thus claim 1, but further specifies identifying regions within the latent space that have higher likelihoods of generating desired functionality and from that selecting points and using the decoding/generation in the model to map points to amino acids.
Saraf et al., and Wang et al. teach the method of claims 1-2, 4, 20, and 26 as described above.
Wang et al. teaches on page 1, paragraph 1 “the input of computational protein design is the backbone structure of a target protein (or part of a target protein). Through computational sampling and optimization, sequences that are likely to fold into the desired structure are generated for experimental verification. The scoring function usually contains physics-based terms such as van der Waals and electrostatic energy as well as knowledge-based terms such as sidechain rotamer and backbone dihedral preference obtained from statistics of protein structures”.
Sinai et al. teaches in the abstract “we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein”. It would be inherent to the use of dimension reduction that the lower dimension would be a latent space, and therefore the optimization of any selection as discussed previously in Saraf et al. would inherently select from the latent space, those spaces where the points map to amino acids with the desired functionality. Therefore, this would read on wherein the step of selecting the new candidate amino acid sequences for the subsequent iteration further comprises: identifying, based on the fitness function, regions within the latent space that or more likely than other regions to exhibit the desired functionality or are too sparsely sampled to make statistically significant estimates regarding the desired functionality, selecting points within the identified regions within the latent space, and using the decoding/generation performed by the machine-learning model to map from the selecting points to respective candidate amino acid sequences, which are then used as the new candidate amino acid sequences for the subsequent iteration further comprises.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Saraf et al. (Biophysics Journal (2006) 4167-4180), and Wang et al. (Scientific Reports (2018) 1-9) as applied to claims 1-2, 4, 20, and 26 above, and further in view of Jacquin et al. (PLoS Computational Biology (2016) 1-18).
Claim 13 is directed to the method of claim 1 but further specifies training the machine learning model using residue-residue couplings of Potts models to generate a DCA model
Saraf et al., and Wang et al. teach the method of claims 1-2, 4, 20, and 26 as described above.
Jacquin et al. teaches on page 4, paragraph 3 “We show in Fig 2A the positive predictive value (PPV) for contact prediction (Methods) for the four structures SA, SB, SC, SD, based on the ranking of the mutual information (MI) scores and of the inferred Potts couplings, with the mean field (DCA) [6], the pseudo likelihood (PLM), and the adaptive cluster expansion (ACE) procedures. A fifth method, called Projection, shown with magenta lines in Fig 2A will be introduced later on. Mean-field DCA is a very fast, approximate method to infer the couplings… As in the case of real protein data, Potts-based contact predictions, either with DCA, PLM or ACE, generally outperform MI-based predictions”, reading on further comprising training the machine- learning model using the training dataset to learn external fields and residue-residue couplings of a Potts model to generate a DCA model of the training dataset, the DCA model being used as the machine-learning model.
It would have been obvious at the time of first filing to have modified the teachings of Saraf et al. and Wang et al. for the method of claim 1-2, 4, 20, 26, 41-42, and 44, with the teachings of Jacquin et al. for the use of DCA modeling based on Potts couplings as the latter suggests “As in the case of real protein data, Potts-based contact predictions, either with DCA, PLM or ACE, generally outperform MI-based predictions”. One would have had a reasonable expectation of success given that this is merely further specifying the type of machine learning model being used which does not conflict with Wang et al. which provides multiple network models that a DCA could be implemented within. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Saraf et al. (Biophysics Journal (2006) 4167-4180), and Wang et al. (Scientific Reports (2018) 1-9) as applied to claims 1-2, 4, 20, and 26 above, and further in view of Socolich et al. (Nature (2005) 512-518).
Claim 18 is directed to the method of claim 1 but further specifies the use of a coevolutionary matrix to generate an SCA model through simulated annealing.
Saraf et al., and Wang et al. teach the method of claims 1-2, 4, 20, and 26 as described above.
Socolich et al. teaches on page 512, column 1, paragraph 2 “An approach to defining the architecture of amino acid interactions in proteins is suggested by an evolution-based method known as the statistical coupling analysis (SCA)”, in column 2, paragraph 3 of the same page “We began by carrying out the SCA for an alignment of 120 members of the WW domain family”, and on page 513, column 1, paragraph 2 “…the SCA matrix for an alignment of 120 such sequences (termed the site-independent conservation (IC) model) shows no statistical coupling between positions (Fig. 1d). The second algorithm tests the sufficiency of the SCA matrix by building artificial sequences that preserve both the conservation pattern and the pattern of statistical couplings. These sequences are built using a Monte-Carlo-based simulated annealing protocol in which amino acids are completely shuffled within every column of the MSA while minimizing the difference between all coupling values in the SCA matrix for the artificial alignment and for the natural alignment”, reading on further comprising: training the machine-learning model using the training dataset to learn a positional coevolution matrix to generate an SCA model of the training dataset, the SCA model being used as the machine-learning model, generating a sample set of amino acid sequences by performing simulated annealing or simulated heating using the SCA model, the sample set of amino acid sequences expressing the learned implicit patterns of the training dataset, and selecting the candidate amino acid sequences from the generated sample set of amino acid sequences.
It would have been obvious at the time of first filing to have modified the teachings of Saraf et al. and Wang et al. for the method of claim 1-2, 4, 20, 26, 41-42, and 44, with the teachings of Socolich et al. for the use of an SCA algorithm to generate a sample of amino acid sequences and perform simulated annealing, as the latter teaches in the abstract “The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem”. One would have had a reasonable expectation of success given that as previously described this is merely further specifying the type of machine learning model being used which does not conflict with Wang et al. which provides multiple network models that a SCA could be implemented within. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Saraf et al. (Biophysics Journal (2006) 4167-4180), and Wang et al. (Scientific Reports (2018) 1-9) as applied to claims 1-2, 4, 20, and 26 above, and further in view of Cocco et al. (PLoS Computational Biology (2013) 1-17).
Claim 21 is directed to the method of claim 20 and thus claim 1, but further specifies the use of PCA to generate eigenvectors of a correlation matrix and the use of ICA for nonlinear dimensionality reduction.
Saraf et al., and Wang et al. teach the method of claims 1-2, 4, 20, and 26 as described above.
Cocco et al. teaches on page 2, column 1, paragraph 3 “Another mathematically simpler way to extract information from the correlation matrix C is Principal Component Analysis (PCA). PCA looks for the eigenmodes of C associated to the largest eigenvalues. These modes are the ones contributing most to the covariation in the protein family. Combined with clustering approaches, PCA was applied to identify functional residues in [8]. More recently PCA was applied to the SCA correlation matrix, a variant of the matrix C expressing correlations between sites only (and not explicitly the amino-acids they carry) and allowed for identifying groups of correlated (coevolving) residues – termed sectors – each controlling a specific function”. Claim 20 presents a contingent limitation in that either a linear or non-linear dimension reduction can be performed, and MPEP 2111.04(II) states “The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met”, which accordingly reads on wherein the nonlinear dimensionality reduction is a principal component analysis and leading components of the low- dimensional model are the principle components of a principal component analysis represented by a set of eigenvectors that correspond to a set of largest eigenvalues of a correlation matrix, and wherein the nonlinear dimensionality reduction is an independent component analysis in which the eigenvectors are subject to a rotation and scaling operation to identify functionally-independent modes of sequence variation.
It would have been obvious at the time of first filling to have modified the teachings of Saraf et al., and Wang et al. for the method of claims 1-2, 4, 20, 26, 41-42, and 44 with the teaches from Cocco et al. for the use of eigenvalues of a correlation matrix in PCA as the latter teaches within the abstract “We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size”. One would have had a reasonable expectation of success given that the use of dimension reduction techniques does not preclude the use of other machine learning techniques, and as the former two use dimension reduction, would merely be a substitution of known methods. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Saraf et al. (Biophysics Journal (2006) 4167-4180), Wang et al. (Scientific Reports (2018) 1-9), and Sinai et al. (arXiv preprint (2017) 1-6) as applied to claim 1-2, 4, 11, 20, 24, and 26 above, and further in view of Proust-Lima et al. (Journal of Statistical Software (2017) 1-56).
Claim 25 is directed to the method of claim 24 and thus claim 1, but further specifies the generation of a density function for the latent space and the selection of points to be statistically representative of the density function.
Saraf et al., Wang et al., and Sinai et al. teach the method of claims 1-2, 4, 11, 20, 24, and 26 as described above.
Proust-Lima et al. teaches on page 8-9, paragraphs 5 and 1 respectively “For continuous link functions, the individual contribution to the likelihood of a latent process mixed model as defined in Section 2.2 is…where omegai is the same density function of a multivariate normal variable as defined in Equation 11, and J is the Jacobian determinant of the inverse of the link function, that is, the derivative of the linear transformation, the rescaled Beta CDF or the quadratic I-splines”, here the use of mixture models is in conjunction with latent space (dimension reduction), inherently asserting that the density function used is one of latent space. The combination of this teaching with teachings from Saraf et al. and Sinai et al. on the use for fitness functions and optimization for the selection of optimal regions of latent space would read on wherein: the step of identifying the regions within the latent space further includes generating a density function within the latent space based on the fitness function, and the step of selecting the points within the identified regions within the latent space further includes selecting the points to be statistically representative of the density function.
It would have been obvious at the time of first filing to have modified the teachings of Saraf et al., Wang et al., and Sinai et al. for the method of claims 1-2, 4, 11, 20, 24, 26, 41-42, 44, and 94 with the teachings from Proust-Lima et al. for the use of density functions in latent space functions as the latter provides clarification on method implementation, specifically within a computational framework as a companion paper to an R package. One would have had a reasonable expectation of success given that both the functions for implementing and the pseudo code for adapting are provided within the paper along with the necessary mathematical derivations for the understanding of how to link the density function. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.
Subject Matter Potentially Free from the Prior Art
Claims 29, and 31-32 are potentially free from the prior art as nothing could be found relating to the specific calculation of functionality landscapes based solely on stability or sequence similarity as proposed in claim 29. As claims 31-32 are dependent from claim 29 and further specify additional limitations such as multi-objective optimization and specified supervised learning techniques, said claims are also potentially free from the prior art.
As claims 29, and 31-32 are dependent claims from claim 26 and thus claim 1, both of which stand rejected, claims 29 and 31-32 are currently objected to as depending from rejected claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEENAN NEIL ANDERSON-FEARS whose telephone number is (571)272-0108. The examiner can normally be reached M-Th, alternate F, 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz Skowronek can be reached at 571-272-9047. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.N.A./Examiner, Art Unit 1687
/OLIVIA M. WISE/Supervisory Patent Examiner, Art Unit 1685