Prosecution Insights
Last updated: April 19, 2026
Application No. 17/148,132

SUPERVISED VAE FOR OPTIMIZATION OF VALUE FUNCTION AND GENERATION OF DESIRED DATA

Non-Final OA §103§112
Filed
Jan 13, 2021
Examiner
NGUYEN, HENRY K
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
4 (Non-Final)
57%
Grant Probability
Moderate
4-5
OA Rounds
4y 7m
To Grant
88%
With Interview

Examiner Intelligence

Grants 57% of resolved cases
57%
Career Allow Rate
90 granted / 158 resolved
+2.0% vs TC avg
Strong +31% interview lift
Without
With
+31.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
26 currently pending
Career history
184
Total Applications
across all art units

Statute-Specific Performance

§101
21.6%
-18.4% vs TC avg
§103
51.4%
+11.4% vs TC avg
§102
7.7%
-32.3% vs TC avg
§112
14.0%
-26.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 158 resolved cases

Office Action

§103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments with respect to claim(s) 1-2, 5-10, 13-15, 22, and 24-31 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 112 Claims 5-8, 13-15, and 24-26 and recites the limitation "said optimization problem". There is insufficient antecedent basis for this limitation in the claim. For examination purposes, Examiner interprets "said optimization" as “said unconstrained optimization problem”. Allowable Subject Matter Claim 29 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-2, 6, 8-10, 13-15, 22, 24-26, and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Oono et al. (US-20170161635-A1) in view of Malone et al. (US-20190325995-A1), Chidlovskii et al. (US-20210097387-A1), and Schafer et al. (US-20210166124-A1). Regarding Claim 1, Oono teaches a computer-implemented method of generating optimal model input data for achieving a target outcome, said method comprising: receiving, at an encoder model (Fig. 2B, encoder) of a supervised variational autoencoder (VAE) running on a programmed processor device, input data (Fig. 2B,XD) relating to a prediction problem to be solved (para [0057]); generating, using the encoder model of the VAE, a latent feature representation of the input data (Fig. 2B, z) in a latent feature space (Fig. 2B, Z); (¶[0068] describes that a seed compound and its associated label may be input to the encoder, which outputs a latent variable from which a latent representation may be sampled) of reduced dimensionality (para [0132] “These latent representations may be of lower dimension.”), said encoder model trained to capture a distribution of the input data (¶[0046] “The training may comprise having the probabilistic or variational autoencoder learn to approximate an encoding distribution.”); receiving, at a VAE decoder model (Fig. 2B, decoder) running on the programmed processor device, the latent feature representation of the input data in the latent feature space (¶[0056] “During training, the decoder may learn a decoding model that maps latent variable Z to a distribution on x, i.e., the decoder may be used to convert a latent representation and a label into a random variable, X˜, from which the sampling module may draw a sample to generate a compound fingerprint, x˜.”) and reconstructing, using the VAE decoder model, said input data using the latent feature representation (Fig. 2B, z) of the input data (Fig. 2B, XD) receiving at said trained VAE decoder model (Fig. 2B, decoder), said optimized latent feature space representation (Fig. 2B, z) of the input data (¶[0059] describes that in an exemplary embodiment of semi-supervised learning, the training data set used for training the generative model contains both compounds that have experimentally identified label information and compounds that have labels predicted by the predictor module. (FIG. 2B). therefore, the predictor module is used during optimizing the latent feature space), running said trained VAE decoder model to generate optimal samples of said input data (Fig. 2B, x~) for achieving said target outcome (y~, desired label) using said optimized latent feature space representation of the input data (Fig. 8 illustrates running the VAE decoder model to produce the output samples x~ which are input into the predictor to achieve the target outcome y. ¶[0106 describes running the VAE using input data xD to output a compound having a desired label (target outcome)); and using said generated optimal samples of said input data for achieving said target outcome of the prediction problem (¶[0035] “Upon training, the systems and methods described herein may output chemical information identifying one or more compounds”.). Oono does not explicitly disclose and generating an unsupervised reconstruction error loss; receiving the latent feature representation at a value predictor model running on the programmed processor device, the value predictor model trained to learn a relationship between the input data and a target outcome using the latent feature representation of the input data and generating a label prediction loss; concurrently training said VAE decoder and value predictor models by minimizing a loss function comprising the reconstruction error loss component used in training said VAE decoder and the label prediction error loss component used in the concurrent training of said value predictor model; forming an unconstrained optimization problem in the latent space to generate an optimized latent space representation for use in obtaining one of: a maximum target outcome, a minimum target outcome or a specific target outcome specified in the prediction problem being solved, and solving said unconstrained optimization problem, using said trained value predictor model, to generate said optimized latent feature space representation of the input data that achieves the maximum target outcome, the minimum target outcome or the specific target outcome; However, Malone (US 20190325995 A1) teaches generating an unsupervised reconstruction error loss (para [0115] “Second, due to its unsupervised reconstruction loss, it allows to learn a vector representation for every data modality and every patient, even if that particular data modality is not observed at all for some of the patients.”); It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono with the unsupervised reconstruction loss of Malone. Doing so would allow for learning a vector representation for every data modality, even if that particular data modality is not observed (Malone para [0115]). Chidlovskii (US 20210097387 A1) teaches receiving the latent feature representation at a value predictor model running on the programmed processor device, the value predictor model trained to learn a relationship between the input data and a target outcome using the latent feature representation of the input data (para [0093] This section discloses the predictor receiving the latent feature representation “z”.) and generating a label prediction loss (para [0048] “In further features, the prediction labels are multivariate data values indicating a location coordinate and the prediction loss is a regression loss.”); concurrently training said VAE decoder and value predictor models by minimizing a loss function comprising the reconstruction error loss component used in training said VAE decoder and the label prediction error loss component used in the concurrent training of said value predictor model (para [0042] “The predictor includes: a classification neural network configured to generate a predicted location from signal strength values, where the classification neural network is trained together with a variational autoencoder based on minimizing a reconstruction loss and a prediction loss, and where, during the training, the classification neural network outputs the predicted location to decoder neural networks of the variational autoencoder.”); It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono with the variational autoencoder training of Chidlovskii. Doing so would allow for improving the localization error (Chidlovskii para [0112]). Schafer (US 20210166124 A1) teaches forming an unconstrained optimization problem in the latent space to generate an optimized latent space representation for use in obtaining one of: a maximum target outcome, a minimum target outcome or a specific target outcome specified in the prediction problem being solved (para [0062] “Therefore, using a single model that starts from M input variables and outputs an N-dimensional backscattering vector, with M<<N, may lead to instabilities during an optimization phase and to a network converging to bad local minima. In contrast to directly optimizing the backscattering vector x, applying the proposed pipeline, the optimization is in the space of latent representation z, lying in a lower dimensional space by definition.”), and solving said unconstrained optimization problem, using said trained value predictor model, to generate said optimized latent feature space representation of the input data that achieves the maximum target outcome, the minimum target outcome or the specific target outcome (para [0066] “Embodiments of the present disclosure propose a neural network architecture for the predictive model P.sub.θ and subsequently optimize its weights θ in order to learn the relationship between inputs v and outputs x, minimizing a suitable loss function. In contrast to directly optimizing the backscattering vector x, the optimization can be carried on in the space of latent representation z, keeping fixed the parameters of the generative model G.sub.ξ.” para [0069] “Therefore, overall loss function, computed for a single sample (v, x), is given by: PNG media_image1.png 28 138 media_image1.png Greyscale ). It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono with the loss function of Schafer. Doing so would allow for implementing a loss function that can accomplish two or more goals simultaneously (Schafer para [0067]). Regarding Claim 2, Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono further teaches wherein the VAE encoder, said VAE decoder and value predictor models comprise a machine-learned deep neural network model selected from: a convolutional neural network (CNN), a recurrent neural network (RNN) or a multi-layer perceptron (MLP). (¶[0032] states that The components of the generative model, such as a variational autoencoder, may comprise multi-layer perceptrons implementing a probabilistic encoder and a probabilistic decoder. ) Regarding Claim 6, Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono further teaches wherein said optimization problem is a local optimization to find the optimized latent feature space representation (para [0042]) of said input data sample consistent with the target outcome value (para [0041] “The system may be trained to optimize, for example to minimize, the loss function. In some embodiments, the system is trained by further inputting training labels associated with the chemical compounds. In some embodiments, the system is configured to generate chemical compound fingerprints that have a high likelihood of satisfying a selected set of desired label element values.” The loss between the inputted labels and the desired labels (i.e. target outcome value) is minimized.). Regarding Claim 8, Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono further teaches wherein said optimization problem comprises a probability regularization component to optimize a probability of the latent feature space representation. ([0041] describes The probabilistic autoencoder may be trained to learn to approximate an encoding distribution. The regularization error may comprise a penalty associated with the complexity of the encoding distribution.) Regarding Claim 9, Claim 9 is the system corresponding to the method of claim 1. Claim 9 is substantially similar to claim 1 and is rejected on the same grounds. Regarding Claim 10, Claim 10 is the system corresponding to the method of claim 2. Claim 10 is substantially similar to claim 2 and is rejected on the same grounds. Regarding Claim 13, Claim 13 is the system corresponding to the method of claim 5. Claim 13 is substantially similar to claim 5 and is rejected on the same grounds. Regarding Claim 14, Oono, Malone, Chidlovskii, and Schafer teach the computer system of claim 9. Oono further teaches wherein said optimization problem is one selected from: a local optimization to find the optimized latent feature space representation of said input data sample consistent with the target outcome value, or a local optimization given a specific input data to find optimal samples like the given input data but with a larger target outcome. (¶[0111] Members of the unranked set of compound representations may be input to the latent representation generator (LRG) and the latent representations may be input into the classifier. The classifier may be configured to provide a druglikeness score for each latent representation. The compound representations and/or the associated compounds, may be ordered, for example from highest druglikeness score to lowest druglikeness score. The ranking module may be used to provide as an output a ranked set of compound representations, e.g. fingerprints, and/or compounds. (e.g., optimized latent feature space used to identify samples consistent with target outcome) ¶[0095] describes identifying drugs similar to chemical compound but with a much higher likelihood of possessing desired effects and/or lack of undesired effects than existing compounds in the data set (e.g., identifying samples like the input but with a higher target outcome) Regarding Claim 15, Oono, Malone, Chidlovskii, and Schafer teach the computer-system of claim 9. Oono further teaches wherein said optimization problem comprises: a probability regularization component to optimize a probability of the latent feature space representation ([0041] describes The probabilistic autoencoder may be trained to learn to approximate an encoding distribution. The regularization error may comprise a penalty associated with the complexity of the encoding distribution.) Regarding Claim 22, Claim 22 is the computer program product corresponding to the method of claim 1. Claim 22 is substantially similar to claim 1 and is rejected on the same grounds. Regarding Claim 24, Claim 24 is the computer program product corresponding to the method of claim 5. Claim 24 is substantially similar to claim 5 and is rejected on the same grounds. Regarding Claim 25, Claim 25 is the computer program product corresponding to the system of claim 14. Claim 25 is substantially similar to claim 14 and is rejected on the same grounds. Regarding Claim 26, Claim 26 is the computer program product corresponding to the system of claim 15. Claim 26 is substantially similar to claim 15 and is rejected on the same grounds. Regarding Claim 30, Oono, Malone, Chidlovskii, and Schumann teach the computer-implemented method of claim 1. Chidlovskii further teaches wherein said concurrently training said VAE decoder and value predictor models by minimizing a loss function is an iterative process comprising, at each iteration: concurrently running said decoder model and predictor model to obtain a respective reconstructed input sample and predictor values and generating respective loss values using the loss function of the VAE decoder model and predictor model, evaluating the loss function, and tuning initial parameters of said VAE encoder model, decoder model and predictor model until said loss function is minimized (para [0042] “The predictor includes: a classification neural network configured to generate a predicted location from signal strength values, where the classification neural network is trained together with a variational autoencoder based on minimizing a reconstruction loss and a prediction loss, and where, during the training, the classification neural network outputs the predicted location to decoder neural networks of the variational autoencoder.” Para [0044] “In further features, the classification neural network is trained together with a variational autoencoder based on minimizing the reconstruction loss and the prediction loss including determining gradients of a sum of a first term and a second term with respect to first hidden states of encoder neural networks of the variational autoencoder, with respect to second hidden states of the decoder neural networks, and with respect to third hidden states of the classification neural network.” update gradients and weights (i.e., parameters) for the classification neural networks (i.e., predictor) and decoder neural networks.) Regarding Claim 31, Oono, Malone, Chidlovskii, and Schumann teach the computer system of claim 9. Chidlovskii further teaches wherein to concurrently train said VAE decoder and value predictor models by minimizing a loss function comprises an iterative process comprising, at each iteration: concurrently running said decoder model and predictor model to obtain a respective reconstructed input sample and predictor values and generating respective loss values using the loss function of the VAE decoder model and predictor model, evaluating the loss function, and tuning initial parameters of said VAE encoder model, decoder model and predictor model until said loss function is minimized (para [0042] “The predictor includes: a classification neural network configured to generate a predicted location from signal strength values, where the classification neural network is trained together with a variational autoencoder based on minimizing a reconstruction loss and a prediction loss, and where, during the training, the classification neural network outputs the predicted location to decoder neural networks of the variational autoencoder.” Para [0044] “In further features, the classification neural network is trained together with a variational autoencoder based on minimizing the reconstruction loss and the prediction loss including determining gradients of a sum of a first term and a second term with respect to first hidden states of encoder neural networks of the variational autoencoder, with respect to second hidden states of the decoder neural networks, and with respect to third hidden states of the classification neural network.” update gradients (i.e., parameters) for the classification neural networks (i.e., predictor) and decoder neural networks.) Claims 5 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Oono, Malone, Chidlovskii, and Schafer, as applied above, and further in view of Rozo et al. (US-20210178585-A1). Regarding Claim 5, Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono, Malone, Chidlovskii, and Schafer do not explicitly disclose wherein said optimization problem is a global optimization to find the optimized latent feature space representation of said input data sample which generates the largest target outcome value. However, Rozo (US 20210178585 A1) teaches wherein said optimization problem is a global optimization to find the optimized latent feature space representation of said input data sample (para [0009] “a Bayesian optimization method is described, wherein the objective function is modelled in a low dimensional space and wherein the acquisition function is optimized in a high dimensional space.”) which generates the largest target outcome value (para [0070] “Bayesian optimization (BO) is a sequential search algorithm aiming at finding a global maximizer (or minimizer) of an unknown objective function ƒ, i.e., finding PNG media_image2.png 37 130 media_image2.png Greyscale where .Math. is some design space of interest (i.e., the parameter space from which parameter values may be chosen), with being the dimensionality of the parameter space.”). It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the bayesian optimization of Rozo. Doing so would allow for decreasing the complexity of the problem while improving the convergence and accuracy of the model (Rozo para [0069]). Regarding Claim 27, Oono, Malone, Chidlovskii, Schafer, and Rozo teach the computer-implemented method of claim 5. Rozo further teaches wherein said global optimization to find the optimized latent feature space representation of said input data sample which generates the largest target outcome value comprises: applying an argmax function to said value predictor model operating on the latent feature representation of the input data in the latent feature space (para [0070] “Bayesian optimization (BO) is a sequential search algorithm aiming at finding a global maximizer (or minimizer) of an unknown objective function ƒ, i.e., finding PNG media_image2.png 37 130 media_image2.png Greyscale ”). It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the bayesian optimization of Rozo. Doing so would allow for decreasing the complexity of the problem while improving the convergence and accuracy of the model (Rozo para [0069]). Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Oono, Malone, Chidlovskii, and Schafer, as applied above, and further in view of Kwon et al. (US-20210110892-A1). Regarding Claim 7, Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono, Malone, Chidlovskii, and Schafer do not explicitly disclose wherein said optimization problem is a local optimization given a specific input data to find optimal samples like the given input data but with a larger target outcome. However, Kwon (US-20210110892-A1) teaches wherein said optimization problem is a local optimization given a specific input data to find optimal samples like the given input data but with a larger target outcome (para [0149] In detail, the chemical structure generation model may update the weight thereof based on the predicted property value received from the property prediction model being equal to or greater than a target property value.). It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the training target value of Kwon. Doing so would increase the training and inference efficiencies of the model (Kwon para [0149]). Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Oono, Malone, Chidlovskii, and Schafer, as applied above, and further in view of Middlebrooks et al. (US-20230004096-A1). Regarding Claim 28, Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 6. Oono, Malone, Chidlovskii, and Schafer do not explicitly disclose wherein said local optimization to find the optimized latent feature space representation of said input data sample consistent with the target outcome value comprises: applying an argmin function to said value predictor model operating on the latent feature representation of the input data in the latent feature space that is modified by a given target outcome value. However, Middlebrooks (US 20230004096 A1) teaches wherein said local optimization to find the optimized latent feature space representation of said input data sample consistent with the target outcome value comprises: applying an argmin function to said value predictor model operating on the latent feature representation of the input data in the latent feature space that is modified by a given target outcome value (para [0141] “It should be noted that x′ (described above) represents any predicted image, and x* is the particular image that minimizes the norm in equation (2), i.e. the image containing the amplitude and phase that one is trying to retrieve. The minimization problem above can be equivalently formulated in the lower dimensional latent space representation as follows: PNG media_image3.png 41 372 media_image3.png Greyscale ”). It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the objective function of Middlebrooks. Doing so would allow for finding a latent space representation that facilitates gradient-based optimization to efficiently guide the search for optimal samples (Middlebrooks para [0141]). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /HENRY NGUYEN/Examiner, Art Unit 2121
Read full office action

Prosecution Timeline

Jan 13, 2021
Application Filed
Feb 23, 2024
Non-Final Rejection — §103, §112
May 31, 2024
Response Filed
Aug 30, 2024
Final Rejection — §103, §112
Nov 12, 2024
Response after Non-Final Action
Nov 19, 2024
Response after Non-Final Action
Dec 10, 2024
Request for Continued Examination
Dec 17, 2024
Response after Non-Final Action
Jul 18, 2025
Non-Final Rejection — §103, §112
Oct 22, 2025
Response Filed
Jan 29, 2026
Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12585933
TRANSFER LEARNING WITH AUGMENTED NEURAL NETWORKS
2y 5m to grant Granted Mar 24, 2026
Patent 12572776
Method, System, and Computer Program Product for Universal Depth Graph Neural Networks
2y 5m to grant Granted Mar 10, 2026
Patent 12547484
Methods and Systems for Modifying Diagnostic Flowcharts Based on Flowchart Performances
2y 5m to grant Granted Feb 10, 2026
Patent 12541676
NEUROMETRIC AUTHENTICATION SYSTEM
2y 5m to grant Granted Feb 03, 2026
Patent 12505470
SYSTEMS, METHODS, AND STORAGE MEDIA FOR TRAINING A MACHINE LEARNING MODEL
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

4-5
Expected OA Rounds
57%
Grant Probability
88%
With Interview (+31.4%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 158 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month