Last updated: May 04, 2026

Application No. 17/148,132

SUPERVISED VAE FOR OPTIMIZATION OF VALUE FUNCTION AND GENERATION OF DESIRED DATA

Non-Final OA §103§112

Filed

Jan 13, 2021

Examiner

NGUYEN, HENRY K

Art Unit

2121

Tech Center

2100 — Computer Architecture & Software

Assignee

International Business Machines Corporation

OA Round

4 (Non-Final)

This examiner grants 57% of cases after interview

— +30.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 159 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, HENRY K View full profile →

Grants 57% of resolved cases

Career Allowance Rate

91 granted / 159 resolved

+2.2% vs TC avg

Strong +30% interview lift

Without

With

+30.4%

Interview Lift

resolved cases with interview

Typical timeline

4y 5m

Avg Prosecution

25 currently pending

Career history

184

Total Applications

across all art units

Statute-Specific Performance

§101

21.5%

-18.5% vs TC avg

§103

51.5%

+11.5% vs TC avg

§102

7.7%

-32.3% vs TC avg

§112

13.9%

-26.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 159 resolved cases

Office Action

§103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-2, 5-10, 13-15, 22, and 24-31 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 112
Claims 5-8, 13-15, and 24-26 and recites the limitation "said optimization problem".  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, Examiner interprets "said optimization" as “said unconstrained optimization problem”.
Allowable Subject Matter
Claim 29 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6, 8-10, 13-15, 22, 24-26, and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Oono et al. (US-20170161635-A1) in view of Malone et al. (US-20190325995-A1), Chidlovskii et al. (US-20210097387-A1), and Schafer et al. (US-20210166124-A1).
Regarding Claim 1,
Oono teaches a computer-implemented method of generating optimal model input data for achieving a target outcome, said method comprising: 
receiving, at an encoder model (Fig. 2B, encoder) of a supervised variational autoencoder (VAE) running on a programmed processor device, input data (Fig. 2B,XD) relating to a prediction problem to be solved (para [0057]);
generating, using the encoder model of the VAE, a latent feature representation of the input data (Fig. 2B, z) in a latent feature space (Fig. 2B, Z); (¶[0068] describes that a seed compound and its associated label may be input to the encoder, which outputs a latent variable from which a latent representation may be sampled) of reduced dimensionality (para [0132] “These latent representations may be of lower dimension.”), said encoder model trained to capture a distribution of the input data (¶[0046] “The training may comprise having the probabilistic or variational autoencoder learn to approximate an encoding distribution.”);
receiving, at a VAE decoder model (Fig. 2B, decoder) running on the programmed processor device, the latent feature representation of the input data in the latent feature space (¶[0056] “During training, the decoder may learn a decoding model that maps latent variable Z to a distribution on x, i.e., the decoder may be used to convert a latent representation and a label into a random variable, X˜, from which the sampling module may draw a sample to generate a compound fingerprint, x˜.”) and 
reconstructing, using the VAE decoder model, said input data using the latent feature representation (Fig. 2B, z) of the input data (Fig. 2B, XD) 
receiving at said trained VAE decoder model (Fig. 2B, decoder), said optimized latent feature space representation (Fig. 2B, z) of the input data (¶[0059] describes that in an exemplary embodiment of semi-supervised learning, the training data set used for training the generative model contains both compounds that have experimentally identified label information and compounds that have labels predicted by the predictor module. (FIG. 2B). therefore, the predictor module is used during optimizing the latent feature space), running said trained VAE decoder model to generate optimal samples of said input data (Fig. 2B, x~) for achieving said target outcome (y~, desired label) using said optimized latent feature space representation of the input data  (Fig. 8 illustrates running the VAE decoder model to produce the output samples x~ which are input into the predictor to achieve the target outcome y.  ¶[0106 describes running the VAE using input data xD to output a compound having a desired label (target outcome)); and 
using said generated optimal samples of said input data for achieving said target outcome of the prediction problem (¶[0035] “Upon training, the systems and methods described herein may output chemical information identifying one or more compounds”.).
	Oono does not explicitly disclose
and generating an unsupervised reconstruction error loss;
receiving the latent feature representation at a value predictor model running on the programmed processor device, the value predictor model trained to learn a relationship between the input data and a target outcome using the latent feature representation of the input data and generating a label prediction loss;
concurrently training said VAE decoder and value predictor models by minimizing a loss function comprising the reconstruction error loss component used in training said VAE decoder and the label prediction error loss component used in the concurrent training of said value predictor model; 
forming an unconstrained optimization problem in the latent space to generate an optimized latent space representation for use in obtaining one of: a maximum target outcome, a minimum target outcome or a specific target outcome specified in the prediction problem being solved, and
solving said unconstrained optimization problem, using said trained value predictor model, to generate said optimized latent feature space representation of the input data that achieves the maximum target outcome, the minimum target outcome or the specific target outcome;	
However, Malone (US 20190325995 A1) teaches
generating an unsupervised reconstruction error loss (para [0115] “Second, due to its unsupervised reconstruction loss, it allows to learn a vector representation for every data modality and every patient, even if that particular data modality is not observed at all for some of the patients.”);
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono with the unsupervised reconstruction loss of Malone.
Doing so would allow for learning a vector representation for every data modality, even if that particular data modality is not observed (Malone para [0115]).
Chidlovskii (US 20210097387 A1) teaches
receiving the latent feature representation at a value predictor model running on the programmed processor device, the value predictor model trained to learn a relationship between the input data and a target outcome using the latent feature representation of the input data (para [0093] This section discloses the predictor receiving the latent feature representation “z”.) and generating a label prediction loss (para [0048] “In further features, the prediction labels are multivariate data values indicating a location coordinate and the prediction loss is a regression loss.”);
concurrently training said VAE decoder and value predictor models by minimizing a loss function comprising the reconstruction error loss component used in training said VAE decoder and the label prediction error loss component used in the concurrent training of said value predictor model (para [0042] “The predictor includes: a classification neural network configured to generate a predicted location from signal strength values, where the classification neural network is trained together with a variational autoencoder based on minimizing a reconstruction loss and a prediction loss, and where, during the training, the classification neural network outputs the predicted location to decoder neural networks of the variational autoencoder.”); 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono with the variational autoencoder training of Chidlovskii.
Doing so would allow for improving the localization error (Chidlovskii para [0112]).
Schafer (US 20210166124 A1) teaches
forming an unconstrained optimization problem in the latent space to generate an optimized latent space representation for use in obtaining one of: a maximum target outcome, a minimum target outcome or a specific target outcome specified in the prediction problem being solved (para [0062] “Therefore, using a single model that starts from M input variables and outputs an N-dimensional backscattering vector, with M<<N, may lead to instabilities during an optimization phase and to a network converging to bad local minima. In contrast to directly optimizing the backscattering vector x, applying the proposed pipeline, the optimization is in the space of latent representation z, lying in a lower dimensional space by definition.”), and
solving said unconstrained optimization problem, using said trained value predictor model, to generate said optimized latent feature space representation of the input data that achieves the maximum target outcome, the minimum target outcome or the specific target outcome (para [0066] “Embodiments of the present disclosure propose a neural network architecture for the predictive model P.sub.θ and subsequently optimize its weights θ in order to learn the relationship between inputs v and outputs x, minimizing a suitable loss function. In contrast to directly optimizing the backscattering vector x, the optimization can be carried on in the space of latent representation z, keeping fixed the parameters of the generative model G.sub.ξ.” para [0069] “Therefore, overall loss function, computed for a single sample (v, x), is given by: 
    PNG
    media_image1.png
    28
    138
    media_image1.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono with the loss function of Schafer.
Doing so would allow for implementing a loss function that can accomplish two or more goals simultaneously (Schafer para [0067]).
Regarding Claim 2,
Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono further teaches wherein the VAE encoder, said VAE decoder and value predictor models comprise a machine-learned deep neural network model selected from: a convolutional neural network (CNN), a recurrent neural network (RNN) or a multi-layer perceptron (MLP). (¶[0032] states that  The components of the generative model, such as a variational autoencoder, may comprise multi-layer perceptrons implementing a probabilistic encoder and a probabilistic decoder. )
Regarding Claim 6,
Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono further teaches wherein said optimization problem is a local optimization to find the optimized latent feature space representation (para [0042]) of said input data sample consistent with the target outcome value (para [0041] “The system may be trained to optimize, for example to minimize, the loss function. In some embodiments, the system is trained by further inputting training labels associated with the chemical compounds. In some embodiments, the system is configured to generate chemical compound fingerprints that have a high likelihood of satisfying a selected set of desired label element values.” The loss between the inputted labels and the desired labels (i.e. target outcome value) is minimized.).
Regarding Claim 8,
Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. Oono further teaches wherein said optimization problem comprises a probability regularization component to optimize a probability of the latent feature space representation. ([0041] describes The probabilistic autoencoder may be trained to learn to approximate an encoding distribution. The regularization error may comprise a penalty associated with the complexity of the encoding distribution.)
Regarding Claim 9,
Claim 9 is the system corresponding to the method of claim 1. Claim 9 is substantially similar to claim 1 and is rejected on the same grounds.
Regarding Claim 10,
Claim 10 is the system corresponding to the method of claim 2. Claim 10 is substantially similar to claim 2 and is rejected on the same grounds.
Regarding Claim 13,
Claim 13 is the system corresponding to the method of claim 5. Claim 13 is substantially similar to claim 5 and is rejected on the same grounds.
Regarding Claim 14,
Oono, Malone, Chidlovskii, and Schafer teach the computer system of claim 9. Oono further teaches wherein said optimization problem is one selected from: a local optimization to find the optimized latent feature space representation of said input data sample consistent with the target outcome value, or a local optimization given a specific input data to find optimal samples like the given input data but with a larger target outcome.  (¶[0111] Members of the unranked set of compound representations may be input to the latent representation generator (LRG) and the latent representations may be input into the classifier. The classifier may be configured to provide a druglikeness score for each latent representation. The compound representations and/or the associated compounds, may be ordered, for example from highest druglikeness score to lowest druglikeness score. The ranking module may be used to provide as an output a ranked set of compound representations, e.g. fingerprints, and/or compounds. (e.g., optimized latent feature space used to identify samples consistent with target outcome) ¶[0095] describes identifying drugs similar to chemical compound but with a much higher likelihood of possessing desired effects and/or lack of undesired effects than existing compounds in the data set (e.g., identifying samples like the input but with a higher target outcome)
Regarding Claim 15,
Oono, Malone, Chidlovskii, and Schafer teach the computer-system of claim 9. Oono further teaches wherein said optimization problem comprises: a probability regularization component to optimize a probability of the latent feature space representation ([0041] describes The probabilistic autoencoder may be trained to learn to approximate an encoding distribution. The regularization error may comprise a penalty associated with the complexity of the encoding distribution.)
Regarding Claim 22,
Claim 22 is the computer program product corresponding to the method of claim 1. Claim 22 is substantially similar to claim 1 and is rejected on the same grounds.
Regarding Claim 24,
Claim 24 is the computer program product corresponding to the method of claim 5. Claim 24 is substantially similar to claim 5 and is rejected on the same grounds.
Regarding Claim 25,
Claim 25 is the computer program product corresponding to the system of claim 14. Claim 25 is substantially similar to claim 14 and is rejected on the same grounds.
Regarding Claim 26,
Claim 26 is the computer program product corresponding to the system of claim 15. Claim 26 is substantially similar to claim 15 and is rejected on the same grounds.
Regarding Claim 30,
Oono, Malone, Chidlovskii, and Schumann teach the computer-implemented method of claim 1. Chidlovskii further teaches wherein said concurrently training said VAE decoder and value predictor models by minimizing a loss function is an iterative process comprising, at each iteration: 
concurrently running said decoder model and predictor model to obtain a respective reconstructed input sample and predictor values and generating respective loss values using the loss function of the VAE decoder model and predictor model, evaluating the loss function, and tuning initial parameters of said VAE encoder model, decoder model and predictor model until said loss function is minimized (para [0042] “The predictor includes: a classification neural network configured to generate a predicted location from signal strength values, where the classification neural network is trained together with a variational autoencoder based on minimizing a reconstruction loss and a prediction loss, and where, during the training, the classification neural network outputs the predicted location to decoder neural networks of the variational autoencoder.” Para [0044] “In further features, the classification neural network is trained together with a variational autoencoder based on minimizing the reconstruction loss and the prediction loss including determining gradients of a sum of a first term and a second term with respect to first hidden states of encoder neural networks of the variational autoencoder, with respect to second hidden states of the decoder neural networks, and with respect to third hidden states of the classification neural network.” update gradients and weights (i.e., parameters) for the classification neural networks (i.e., predictor) and decoder neural networks.)
Regarding Claim 31,
Oono, Malone, Chidlovskii, and Schumann teach the computer system of claim 9. Chidlovskii further teaches wherein to concurrently train said VAE decoder and value predictor models by minimizing a loss function comprises an iterative process comprising, at each iteration: concurrently running said decoder model and predictor model to obtain a respective reconstructed input sample and predictor values and generating respective loss values using the loss function of the VAE decoder model and predictor model, evaluating the loss function, and tuning initial parameters of said VAE encoder model, decoder model and predictor model until said loss function is minimized (para [0042] “The predictor includes: a classification neural network configured to generate a predicted location from signal strength values, where the classification neural network is trained together with a variational autoencoder based on minimizing a reconstruction loss and a prediction loss, and where, during the training, the classification neural network outputs the predicted location to decoder neural networks of the variational autoencoder.” Para [0044] “In further features, the classification neural network is trained together with a variational autoencoder based on minimizing the reconstruction loss and the prediction loss including determining gradients of a sum of a first term and a second term with respect to first hidden states of encoder neural networks of the variational autoencoder, with respect to second hidden states of the decoder neural networks, and with respect to third hidden states of the classification neural network.” update gradients (i.e., parameters) for the classification neural networks (i.e., predictor) and decoder neural networks.)

Claims 5 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Oono, Malone, Chidlovskii, and Schafer, as applied above, and further in view of Rozo et al. (US-20210178585-A1).
Regarding Claim 5,
Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. 
	Oono, Malone, Chidlovskii, and Schafer do not explicitly disclose
wherein said optimization problem is a global optimization to find the optimized latent feature space representation of said input data sample which generates the largest target outcome value.
However, Rozo (US 20210178585 A1) teaches
wherein said optimization problem is a global optimization to find the optimized latent feature space representation of said input data sample (para [0009] “a Bayesian optimization method is described, wherein the objective function is modelled in a low dimensional space and wherein the acquisition function is optimized in a high dimensional space.”) which generates the largest target outcome value (para [0070] “Bayesian optimization (BO) is a sequential search algorithm aiming at finding a global maximizer (or minimizer) of an unknown objective function ƒ, i.e., finding 
    PNG
    media_image2.png
    37
    130
    media_image2.png
    Greyscale
 where .Math. is some design space of interest (i.e., the parameter space from which parameter values may be chosen), with being the dimensionality of the parameter space.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the bayesian optimization of Rozo.
Doing so would allow for decreasing the complexity of the problem while improving the convergence and accuracy of the model (Rozo para [0069]).
Regarding Claim 27,
Oono, Malone, Chidlovskii, Schafer, and Rozo teach the computer-implemented method of claim 5. Rozo further teaches wherein said global optimization to find the optimized latent feature space representation of said input data sample which generates the largest target outcome value comprises: applying an argmax function to said value predictor model operating on the latent feature representation of the input data in the latent feature space (para [0070] “Bayesian optimization (BO) is a sequential search algorithm aiming at finding a global maximizer (or minimizer) of an unknown objective function ƒ, i.e., finding 
    PNG
    media_image2.png
    37
    130
    media_image2.png
    Greyscale
”).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the bayesian optimization of Rozo.
Doing so would allow for decreasing the complexity of the problem while improving the convergence and accuracy of the model (Rozo para [0069]).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Oono, Malone, Chidlovskii, and Schafer, as applied above, and further in view of Kwon et al. (US-20210110892-A1).
Regarding Claim 7,
Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 1. 
	Oono, Malone, Chidlovskii, and Schafer do not explicitly disclose
wherein said optimization problem is a local optimization given a specific input data to find optimal samples like the given input data but with a larger target outcome.
	However, Kwon (US-20210110892-A1) teaches
wherein said optimization problem is a local optimization given a specific input data to find optimal samples like the given input data but with a larger target outcome (para [0149] In detail, the chemical structure generation model may update the weight thereof based on the predicted property value received from the property prediction model being equal to or greater than a target property value.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the training target value of Kwon.
Doing so would increase the training and inference efficiencies of the model (Kwon para [0149]).

Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Oono, Malone, Chidlovskii, and Schafer, as applied above, and further in view of Middlebrooks et al. (US-20230004096-A1).
Regarding Claim 28,
Oono, Malone, Chidlovskii, and Schafer teach the computer-implemented method of claim 6. 
	Oono, Malone, Chidlovskii, and Schafer do not explicitly disclose
wherein said local optimization to find the optimized latent feature space representation of said input data sample consistent with the target outcome value comprises: applying an argmin function to said value predictor model operating on the latent feature representation of the input data in the latent feature space that is modified by a given target outcome value.
However, Middlebrooks (US 20230004096 A1) teaches
wherein said local optimization to find the optimized latent feature space representation of said input data sample consistent with the target outcome value comprises: applying an argmin function to said value predictor model operating on the latent feature representation of the input data in the latent feature space that is modified by a given target outcome value (para [0141] “It should be noted that x′ (described above) represents any predicted image, and x* is the particular image that minimizes the norm in equation (2), i.e. the image containing the amplitude and phase that one is trying to retrieve. The minimization problem above can be equivalently formulated in the lower dimensional latent space representation as follows: 
    PNG
    media_image3.png
    41
    372
    media_image3.png
    Greyscale
”).

It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the VAE of Oono, Malone, Chidlovskii, and Schafer with the objective function of Middlebrooks.
Doing so would allow for finding a latent space representation that facilitates gradient-based optimization to efficiently guide the search for optimal samples (Middlebrooks para [0141]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HENRY NGUYEN/Examiner, Art Unit 2121

Read full office action

Prosecution Timeline

Show 3 earlier events

Aug 30, 2024

Final Rejection — §103, §112

Nov 12, 2024

Response after Non-Final Action

Nov 19, 2024

Response after Non-Final Action

Dec 10, 2024

Request for Continued Examination

Dec 17, 2024

Response after Non-Final Action

Jul 18, 2025

Non-Final Rejection — §103, §112

Oct 22, 2025

Response Filed

Jan 29, 2026

Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

16/561,896

Patent 12585933

TRANSFER LEARNING WITH AUGMENTED NEURAL NETWORKS

6y 6m to grant Granted Mar 24, 2026

19/115,468

Patent 12572776

Method, System, and Computer Program Product for Universal Depth Graph Neural Networks

11m to grant Granted Mar 10, 2026

15/225,806

Patent 12547484

Methods and Systems for Modifying Diagnostic Flowcharts Based on Flowchart Performances

9y 6m to grant Granted Feb 10, 2026

17/153,453

Patent 12541676

NEUROMETRIC AUTHENTICATION SYSTEM

5y 0m to grant Granted Feb 03, 2026

18/509,585

Patent 12505470

SYSTEMS, METHODS, AND STORAGE MEDIA FOR TRAINING A MACHINE LEARNING MODEL

2y 1m to grant Granted Dec 23, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

4-5

Expected OA Rounds

57%

Grant Probability

88%

With Interview (+30.4%)

4y 5m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 159 resolved cases by this examiner. Grant probability derived from career allowance rate.

SUPERVISED VAE FOR OPTIMIZATION OF VALUE FUNCTION AND GENERATION OF DESIRED DATA

This examiner grants 57% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email