Last updated: April 19, 2026
Application No. 17/649,578
System and Method for Automated Transfer Learning with Domain Disentanglement

Final Rejection §102§103§112
Filed
Feb 01, 2022
Examiner
TRAN, DAVID HOANG
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Mitsubishi Electric Research Laboratories Inc.
OA Round
2 (Final)
Interview Optional

— +23.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 14 resolved cases, 2023–2026
Examiner Intelligence

TRAN, DAVID HOANG View full profile →
Grants only 14% of cases
Career Allow Rate
2 granted / 14 resolved
-40.7% vs TC avg
Strong +23% interview lift
Without
With
+23.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
30.4%
-9.6% vs TC avg
§103
45.5%
+5.5% vs TC avg
§102
9.3%
-30.7% vs TC avg
§112
13.3%
-26.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The previous 35 U.S.C. 112(b) rejections are withdrawn due to Applicant’s amendments.
Response to Arguments
Applicant’s arguments filed 02/24/2026 on pages 9-13 of Remarks regarding the rejection under 35 U.S.C. 103 with respect to claims 1-20 have been fully considered but they are not persuasive.
Beginning on page 10, Applicant asserts that in respect to claim 1, Demir only mentions a single regularization mechanism and does not disclose exploration over a set of pre-shot regularization methods. However, Examiner is interpreting the claim language as one method satisfies “a set” as taught by Demir on page 7, “Adversarial Regularization: We can utilize adversarial censoring when Z and S should be marginally independent, e.g., such as in Fig. 1(b) and Fig. 7, in order to reinforce the learning of a representation Z that is disentangled from the nuisance variations S. This is accomplished by introducing an adversarial network that aims to maximize a parameterized approximation q(s|z) of the likelihood p(s|z)”
	Applicant asserts that a single CNN-based implementation does not constitute exploration over a set of pre-processing methods. Applicant asserts that a single ensemble method mentioned by Demir does not disclosed the claimed element of exploration over a set of post-processing methods. However, Examiner is interpreting the claim language as at least one method satisfies “a set” as taught by Demir on page 2, “Given Bayesian graphs, some meaningful inference graphs are generated through the Bayes-Ball algorithm (Shachter, 2013) for pruning redundant links to achieve high-accuracy estimation. In order to promote robustness against nuisance parameters such as subject IDs, the explored Bayesian graphs can provide reasoning to use adversarial training with/without variational modeling and latent disentanglement. We demonstrate that AutoBayes can achieve excellent performance across various public datasets, and in particular with an ensemble stacking of multiple explored graphical models
	Applicant asserts that Demir does not disclose exploration over multiple post-shot adaptation methods. However, Examiner is interpreting the claim language as at least one method satisfies “a set” which is shown on page 4, Algorithm 1 of Demir, “AutoBayes, that explores different graphical models linking classifier, encoder, decoder, estimator and adversary network blocks to optimize nuisance-invariant machine learning pipelines.”
	Beginning on page 2, Applicant asserts that in respect to claim 2, Demir does not disclose hyperparameter-driven specification across multiple method categories. However, Examiner is interpreting the claim language as one method satisfies “a set” which is shown above.
	Applicant asserts that in respect to claim 3, Demir only discloses only a single censoring mechanism. However, Examiner is interpreting the claim language as at least one method satisfies “a set” as shown on page 5 of Demir, “AutoBayes begins with exploring any potential Bayesian graphs by cutting links of the full-chain graph in Fig. 4(a), imposing possible (conditional) independence. We then adopt the Bayes-Ball algorithm on each hypothetical Bayesian graph to examine conditional independence over different inference strategies, e.g., full-chain Z-/S-first inference graphs in Figs. 4(b)/(c). Applying Bayes-Ball justifies the reasonable pruning of the links in the full-chain inference graphs, and also the potential adversarial censoring when Z is independent of S. This process automatically constructs a connectivity of inference, generative, and adversary blocks with sound reasoning.”
	Applicant asserts that in respect to claim 4, Demir does not disclose each and every limitation regarding additional censoring methods. However, the claim language shows “wherein the censoring methods include at least one of an adversarial censoring method”, therefore the limitation is disclosed in page 5 of Demir, “Note that the Bayes-Ball also reveals that there is no marginal dependency between Z and S, which provides the reason to use adversarial censoring to suppress nuisance information S in the latent space Z.”
Applicant asserts that in respect to claim 7, Demir does not disclose multiple distinct post-shot adaptation methods or exploration across such methods. However, Examiner is interpreting the claim language as at least one method satisfies “a set” which is shown on page 18 of Demir, “We also use batch normalization (BN) and ReLU activation as listed in Table 3.”
Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 13 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
The claim language is unclear because the Examiner is unclear about the applicant’s intent. Is the intent to be all of the data the way the claim language currently reads? Is the intent for it to be one piece of the data from the list, or a combination of this data from each data type? For purposes of examination, Examiner is interpreting the claim language to require a combination with at least one type of data from media data, physical data and physiological data. The closest section to the instant specification is paragraph [0035] which Examiner reads as to be in line with the interpretation taken.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-11 and 13-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Demir et al. (AutoBayes: Automated Bayesian Graph Exploration for Nuisance-Robust Inference); hereinafter Demir
Regarding claim 1, Demir teaches a system for automated construction of an artificial neural network architecture, comprising: (“In this paper, we propose a systematic automation framework called AutoBayes, which searches for the best inference graph model associated to a Bayesian graph model well-suited to reproduce the training datasets.”; page 2, paragraph 2)
a set of interfaces and data links configured to receive and send signals, (“With the Bayes-Ball algorithm, our method can automatically construct reasonable link connections among classifier, encoder, decoder, nuisance estimator and adversary DNN blocks.”; page 9, 5 Conclusion and Future Work; Note: The classifier, encoder, decoder, nuisance estimator and adversary DNN blocks are the set of interfaces that transmit signals.) wherein the signals include datasets of training data, validation data (“The AutoBayes automatically constructs non-redundant inference factor graphs given a hypothetical Bayesian graph assumption, through the use of the Bayes-Ball algorithm. Depending on the derived conditional independency and pruned factor graphs, DNN blocks for encoder E, decoder D, classifier C, nuisance estimator N and adversary A are reasonably connected. The whole DNN blocks are trained with adversary learning in a variational Bayesian inference.”; page 3, 3 AutoBayes Algorithm; Note: See page 4, Algorithm 1 of Demir to see that training/validation datasets are used in the framework), and testing data, (“We experimentally demonstrate the performance of AutoBayes for publicly available datasets as listed in Table 1. Note that they cover a wide variety of data size, dimensionality, subject scale, and class levels as well as sensor modalities including image, EEG, EMG, and electrocorticography (ECoG). See more detailed information of each dataset in Appendix A.6.”; page 7, 4 Experimental Evaluation)
wherein the signals include a set of random variable factors in multi-dimensional signals X, wherein part of the random variable factors are associated with task labels Y to identify, and nuisance variations S; (“At the core of our methodology is the consideration of graphical Bayesian models that capture the probabilistic relationship between random variables representing the data features X, task labels Y , nuisance variation labels S”; page 2, 2 Key Contributions; and “When we need to feed S along with 2D data of X into the CNN encoder such as in the model Ds, dimension mismatch poses a problem. We address this issue by using one linear layer to project S into the temporal dimensional space of X and another linear layer to project it into the spatial dimensional space of X.”; page 18, A.7 DNN Model Parameters)
a set of memory banks to store a set of reconfigurable deep neural network (DNN) blocks, (“AutoBayes offers a solid reason of how to connect multiple DNN blocks”; page 3, Figure 2)
wherein each of the reconfigurable DNN blocks is configured with main task pipeline modules to identify the task labels Y from the multi-dimensional signals X and (“The ultimate goal is to infer the task label Y from the measured data feature X, which is hindered by the presence of nuisance variations (e.g., inter-subject/session variations) that are (partially) labelled by S.”; page 2, 2 Key Contributions)
with a set of auxiliary regularization modules to adjust disentanglement between a plurality of latent variables Z and the nuisance variations S, (“We can utilize adversarial censoring when Z and S should be marginally independent, e.g., such as in Fig. 1(b) and Fig. 7, in order to reinforce the learning of a representation Z that is disentangled from the nuisance variations S. This is accomplished by introducing an adversarial network that aims to maximize a parameterized approximation q(s|z) of the likelihood p(s|z), while this likelihood is also incorporated into the loss for the other modules with a negative weight. The adversarial network, by maximizing the log likelihood log q(sjz), essentially maximizes a lower-bound of the mutual information I(S;Z), and hence the main network is regularized with the additional term that corresponds to minimizing this estimate of mutual information. This follows since the log-likelihood maximized by the adversarial network is given by”; page 7, Adversarial Regularization)
wherein the memory banks further include hyperparameters, (“AutoBayes can be readily integrated with AutoML to optimize any hyperparameters of individual DNN blocks.”; page 9) trainable variables, (“DNN blocks for encoder E, decoder D, classifier C, nuisance estimator N and adversary A are reasonably connected.”; page 3, AutoBayes) intermediate neuron signals, (“When we need to feed S along with 2D data of X into the CNN encoder such as in the model Ds, dimension mismatch poses a problem. We address this issue by using one linear layer to project S into the temporal dimensional space of X and another linear layer to project it into the spatial dimensional space of X.”; page 18, A.7 DNN Model Parameters) and temporary computation values including forward-pass signals and backward-pass gradients; at least one processor, in connection with the interface and the memory banks, configured to submit the signals and the datasets into the reconfigurable DNN blocks, (“AutoBayes Algorithm: The overall procedure of the AutoBayes algorithm is described in the pseudocode of Algorithm 1. The AutoBayes automatically constructs non-redundant inference factor graphs given a hypothetical Bayesian graph assumption, through the use of the Bayes-Ball algorithm. Depending on the derived conditional independency and pruned factor graphs, DNN blocks for encoder E, decoder D, classifier C, nuisance estimator N and adversary A are reasonably connected. The whole DNN blocks are trained with adversary learning in a variational Bayesian inference. Note that hyperparameters of each DNN block can be further optimized by AutoML on top of AutoBayes framework.”; page 3; Note: See page 4, Algorithm 1, Figures 4 and 5 to see that S and Z are neuron signals associated with the forward-pass signals.; and “Adversary train the whole DNN structure to minimize a loss function”; page 4, Algorithm 1; Note: Minimizing a loss function is done during backward propagation.)
wherein the at least one processor is configured to execute an exploration over a set of graphical models, (“AutoBayes, that explores different graphical models linking classifier, encoder, decoder, estimator and adversary network blocks to optimize nuisance-invariant machine learning pipelines.”; page 1, Abstract and “Attach an adversary network A to latent nodes Z for Zk ⊥ S ∈ I”; page 4, Algorithm 1)
a set of pre-shot regularization methods, (“Adversarial Regularization: We can utilize adversarial censoring when Z and S should be marginally independent, e.g., such as in Fig. 1(b) and Fig. 7, in order to reinforce the learning of a representation Z that is disentangled from the nuisance variations S. This is accomplished by introducing an adversarial network that aims to maximize a parameterized approximation q(s|z) of the likelihood p(s|z),”; page 7)
a set of pre-processing methods, (“All models were trained with a minibatch size of 32 and using the Adam optimizer with an initial learning rate of 0:001. The learning rate is halved whenever the validation loss plateaus. A compact convolutional neural network (CNN) with 4 layers is employed as an encoder network E to extract features from C x T data. Each convolution is followed by batch normalization (BN) and rectified linear unit (ReLU) activation. The AutoBayes chooses either a deterministic latent encoder or variational latent encoder under Gaussian prior. The original data is reconstructed by a decoder network D that applies transposed convolutions.”; pages 7-8; Model Implementation; Note: The layers in the CNN are functionally equivalent to the filter banks as part of the pre-processing mechanisms introduced in paragraph [0008] of the present invention’s specification.)
a set of post-processing methods, and (“Ensemble Learning: We further introduce ensemble methods to make best use of all Bayesian graph models explored by the AutoBayes framework without wasting lower-performance models. Ensemble stacked generalization works by stacking the predictions of the base learners in a higher level learning space, where a meta learner corrects the predictions of base learners (Wolpert, 1992). Subsequent to training base learners, we assemble the posterior probability vectors of all base learners together to improve the prediction. We compare the predictive performance of a logistic regression (LR) and a shallow multi-layer perceptron (MLP) as an ensemble meta learner to aggregate all inference models. See Appendix A.5 for more detailed description of the stacked generalization.”; page 7)
a set of post-shot adaptation methods, to reconfigure the reconfigurable DNN blocks such that task prediction is insensitive to the nuisance variations S by modifying the hyperparameters in the memory banks. (“We benchmark the framework on several public datasets, where we have access to subject and class labels during training, and provide analysis of its capability for subject-transfer learning with/without variational modeling and adversarial training.”; page 1, Abstract”; and “In addition, the best model for one dataset does not always perform best for different data, which encourages us to use AutoBayes for adaptive model generation given target datasets.”; page 9, Conclusion and Future Work; and “AutoBayes, that explores different graphical models linking classifier, encoder, decoder, estimator and adversary network blocks to optimize nuisance-invariant machine learning pipelines.”; page 1, Abstract and “Attach an adversary network A to latent nodes Z for Zk ⊥ S ∈ I”; page 4, Algorithm 1)
Regarding claim 2, Demir teaches modifying the hyperparameters to specify the set of graphical models representing a Bayesian graph model and an inference factor graph using a Bayes-ball algorithm; (“The AutoBayes automatically constructs non-redundant inference factor graphs given a hypothetical Bayesian graph assumption, through the use of the Bayes-Ball algorithm. Depending on the derived conditional independency and pruned factor graphs, DNN blocks for encoder E, decoder D, classifier C, nuisance estimator N and adversary A are reasonably connected. The whole DNN blocks are trained with adversary learning in a variational Bayesian inference. Note that hyperparameters of each DNN block can be further optimized by AutoML on top of AutoBayes framework.”; page 3, 3 AutoBayes)
modifying the reconfigurable DNN blocks by linking graph nodes with graph edges to associate with the random variable factors with respect to the multi-dimensional signals X, the task labels Y, the nuisance variations S and the latent variables Z according to the Bayesian graph model and the inference factor graph; (“AutoBayes offers a solid reason of how to connect multiple DNN blocks to impose conditioning and adversary censoring for the task classifier, feature encoder, decoder, nuisance indicator and adversary networks, based on an explored Bayesian graph”; page 3; Note: See Algorithm 1 on page 4 to see dimensional signals X, the task labels Y, the nuisance variations S and the latent variables Z.)
training the reconfigurable DNN blocks with a variational sampling and a gradient method for the training data; selecting the hyperparameters using an output of the reconfigurable DNN blocks for the validation data; and (“The whole DNN blocks are trained with adversary learning in a variational Bayesian inference. Note that hyperparameters of each DNN block can be further optimized by AutoML on top of AutoBayes framework.”; page 3, 3 AutoBayes; and “Adversary train the whole DNN structure to minimize a loss function in (5)”; page 4, Algorithm 1; Note: Minimizing a loss function is done during backward propagation.)
testing the trained reconfigurable DNN blocks for the testing data and new incoming data on fly to be transferred with nuisance robustness. (“Bayesian Graph Model B (Markov Latent): Assuming a latent Z can work in a Markov chain of Y − Z − X shown in Fig. 5(b), we obtain a simple inference model: p(y, s, z|x). Note that this model assumes independence between Z and S, and thus adversarial censoring (Makhzani et al., 2015; Creswell et al., 2017; Lample et al., 2017) can make it more robust against nuisance.“; page 13)
Regarding claim 3, Demir teaches modifying the hyperparameters to specify the set of pre-shot regularization methods using different censoring modes and censoring methods, wherein the censoring modes includes at least one of a marginal censoring mode, a conditional censoring mode, and a complementary censoring mode and wherein the censoring methods includes at least one of divergence censoring methods and mutual information censoring methods; (“AutoBayes begins with exploring any potential Bayesian graphs by cutting links of the full-chain graph in Fig. 4(a), imposing possible (conditional) independence. We then adopt the Bayes-Ball algorithm on each hypothetical Bayesian graph to examine conditional independence over different inference strategies, e.g., full-chain Z-/S-first inference graphs in Figs. 4(b)/(c). Applying Bayes-Ball justifies the reasonable pruning of the links in the full-chain inference graphs, and also the potential adversarial censoring when Z is independent of S. This process automatically constructs a connectivity of inference, generative, and adversary blocks with sound reasoning.“; page 5)
associating the set of auxiliary regularization modules with the reconfigurable DNN blocks such that at least one of latent nodes Z is disentangled from at least one of nuisance variations S according to the set of pre-shot regularization methods; (See Algorithm 1 of Demir to see the training method in lines 10-15.)
training the reconfigurable DNN blocks with the set of auxiliary regularization modules based on the training data; and (See Algorithm 1 of Demir to see the training method in lines 16-19.)
selecting the hyperparameters for the set of censoring modes and the set of censoring methods based on the output of the reconfigurable DNN blocks for the validation data. (“return the best model having highest task accuracy in validation sets”; page 4, Algorithm 1)
Regarding claim 4, Demir teaches wherein the censoring methods include at least one of adversarial censoring method, a mutual information neural estimation (MINE) censoring method, a mutual information gradient estimation (MIGE) censoring method, a maximum mean discrepancy (MMD) censoring method, a pairwise maximum mean discrepancy (MMD) censoring method, a boundary equilibrium generative adversarial network (BEGAN) discriminator censoring method, a Hilbert-Schmidt independence criterion (HSIC) censoring method, and an optimal transport censoring method. (“Note that the Bayes-Ball also reveals that there is no marginal dependency between Z and S, which provides the reason to use adversarial censoring to suppress nuisance information S in the latent space Z.”; page 5)
Regarding claim 5, Demir teaches modifying the hyperparameters to specify the set of pre-processing methods, modifying the hyperparameters includes using at least one of a spatial filtering, spatio-temporal filtering, wavelet transforms, vector auto-regressive filter, self-attention mapping, robust z-scoring, normalization, data augmentation, and universal adversarial example; and (“In variational approach, we reparameterize Z from a prior distribution such as the normal distirbution to marginalize. Depending on the Bayesian graph models, we can also consider reparametering semi-supervision on S (i.e., incorporating a reconstruction loss for S) as a conditioning variable. Conditioning on Y and/or S should depend on consistency with the graphical model assumptions. Since VAE is a special case of CVAE, we will go into further detail about the more general CVAE below”; page 15; Note: This relates to robust z-scoring and normalization.)
modifying the training data, validation data, and testing data to feed in the reconfigurable DNN blocks according to the set of pre-processing methods. (See Algorithm 1 of Demir to see lines 1-20)
Regarding claim 6, Demir teaches wherein the set of post-processing methods includes at least one of cross validation voting, ensemble stacking, and score averaging. (“Ensemble Learning: We further introduce ensemble methods to make best use of all Bayesian graph models explored by the AutoBayes framework without wasting lower-performance models. Ensemble stacked generalization works by stacking the predictions of the base learners in a higher level learning space, where a meta learner corrects the predictions of base learners (Wolpert, 1992)”; page 7)
Regarding claim 7, Demir teaches wherein the set of post-shot adaptation methods includes at least one of pseudo-labeling, soft labeling, confusion minimization, entropy minimization, feature normalization, weighted z-scoring, elastic weight consolidation, label propagation, adaptive layer freezing, hyper network adaptation, latent space clustering, quantization, and sparsification, (“We also use batch normalization (BN) and ReLU activation as listed in Table 3.”; page 18, A.7 DNN Model Parameters; Note: Batch normalization is a form of feature normalization)
wherein the reconfigurable DNN blocks are refined by unfreezing a combination of the trainable variables such that the reconfigurable DNN blocks adapt to a new-domain dataset. (“Given a pair of generative graph and inference graph, the corresponding DNN structures will be trained. For example of the generative graph model K in Fig. 5(k), one relevant inference graph Kz in Fig. 6(k) will result in the overall network structure as shown in Fig. 7, where adversary network is attached as Z2 is (conditionally) independent of S. This 5-node graph model justifies a recent work on partially disentanged A-CVAE by Han et al. (2020). Each factor block is realized by a DNN, e.g., parameterized by θ for pθ(z1, z2|x), and all of the networks except for adversarial network are optimized to minimize corresponding loss functions including L(ˆy, y) as follows:”; page 6; Note: See Algorithm 1 to see that the training is looped which unfreezes the variables when retrained.)
Regarding claim 8, Demir teaches wherein the variational sampling is employed for the latent variables with an independent distribution specified by an exponential family or non-exponential family, as its prior distribution for reparameterization tricks, and for categorical variables of unknown nuisance variations and task labels using Gumbel softmax trick to produce near-one-hot vectors based on a random number generator and a softmax temperature. (“In order to deal with the issue of categorical sampling, we can use the Gumbel-Softmax reparameterization trick (Jang et al., 2016), which enables differentiable approximation of one-hot encoding. Let [π1, π2, . . . , π|S|] denote a target probability mass function for the categorical variable S. Let g1, g2, . . . , g|S| be independent and identically distributed samples drawn from the Gumbel distribution Gumbel(0, 1). 1 Then, generate an |S|-dimensional vector sˆ = [ˆs1, sˆ2, . . . , sˆ|S|] according to (20) where τ > 0 is a softmax temperature. As the softmax temperature τ approaches 0, samples from the Gumbel-Softmax distribution become one-hot and the distribution becomes identical to the target categorical distribution. The temperature τ is usually decreased across training epochs as an annealing technique, e.g., with exponential decaying.”; pages 15-16; Variational Categorical Reparameterization; Note: g is a random variable)
Regarding claim 9, Demir teaches wherein link concatenation comprising a step of multi-dimensional tensor projection with a plural of trainable linear filters or bilinear filters to convert lower-dimensional signals for dimension-mismatched links. (“When we need to feed S along with 2D data of X into the CNN encoder such as in the model Ds, dimension mismatch poses a problem. We address this issue by using one linear layer to project S into the temporal dimensional space of X and another linear layer to project it into the spatial dimensional space of X. The dot product of those two projected vectors is concatenated as additional channel input.”; page 18, A.7 DNN Model Parameters)
Regarding claim 10, Demir teaches wherein the reconfigurable DNN blocks are configured with a combination of at least two fully-connect layer, convolutional layer, graph convolutional layer, recurrent layer, loopy connection, skip connection, and inception layer with a set of nonlinear activations including at least one of rectified linear variants, hyperbolic tangent, sigmoid, gated linear, softmax, and thresholding, regularized with a combination of dropout, swap out, zone out, block out, drop connect, noise injection, shaking, and batch normalization. (“For 2D datasets, we use deep CNN for the encoder E and decoder D blocks. For the classifier C, nuisance estimator N , and adversary A, we use a multi-layer perceptron (MLP) having three layers, whose hidden nodes are doubled from the input dimension. We also use batch normalization (BN) and ReLU activation as listed in Table 3.”; page 18, A.7 DNN Model Parameters)
Regarding claim 11, Demir teaches wherein the training performs updating the trainable parameters of the reconfigurable DNN blocks by using the training data such that output of the reconfigurable DNN blocks provide smaller loss values in a combination of at least two of objective functions, wherein the objective functions further include a combination of mean-square error, cross entropy, structural similarity, negative log-likelihood, absolute error, cross covariance, clustering loss, divergence, hinge loss, Huber loss, negative sampling, Wasserstein distance, and triplet loss, wherein the loss functions are weighted with a plural of regularization coefficients adjusted according to the specified training schedules. (“We can utilize adversarial censoring when Z and S should be marginally independent, e.g., such as in Fig. 1(b) and Fig. 7, in order to reinforce the learning of a representation Z that is disentangled from the nuisance variations S. This is accomplished by introducing an adversarial network that aims to maximize a parameterized approximation q(s|z) of the likelihood p(s|z), while this likelihood is also incorporated into the loss for the other modules with a negative weight.”; page 7; Adversarial Regularization)
Regarding claim 13, Demir teaches wherein the datasets includes sensor measurements further comprising:
media data including images, pictures, movies, texts, letters, voices, music, audios, speeches; (“QMNIST: A hand-written digit image MNIST with extended label information including a writer ID number”; page 16, A.6 Datasets Description)
physical data including radio waves, optical signals, electrical pulses, temperatures, pressures, accelerations, speeds, vibrations and forces; and (“The data were collected by C = 7 sensors, i.e., electrodermal activity, temperature, threedimensional acceleration, heart rate, and arterial oxygen level.”; page 16, A.6 Datasets Description)
physiological data including heart rate, blood pressure, mass, moisture, electroencephalogram, electromyogram, electrocardiogram, mechanomyogram, electrooculogram, galvanic skin response, magnetoencephalogram, and electrocorticography. (“The dataset consists of EEG data recorded from |S| = 16 healthy subjects participating in an offline P300 spelling task, where visual feedback of the inferred letter is provided to the user at the end of each trial for 1.3 seconds to monitor evoked brain responses for erroneous decisions made by the system.”; page 17, A.6 Datasets Description)
Regarding claim 14, Demir teaches wherein the nuisance variations include a set of subject identifications, (“We benchmark the framework on several public datasets, where we have access to subject and class labels during training, and provide analysis of its capability for subject-transfer learning with/without variational modeling and adversarial training.”; page 1, Abstract)
session numbers, (“RSVP: An EEG-based typing interface using rapid serial visual presentation (RSVP) paradigm (Orhan et al., 2012).4 |S| = 10 healthy subjects participated in the experiments at three sessions performed on different days.”; page 16, A.6 Datasets Description)
biological states, environmental states, (“Stress: A physiological dataset considering neurological stress level (Birjandtalab et al., 2016).3 It consists of multi-modal biosignals for |Y | = 4 discrete stress states from |S| = 20 healthy subjects, including physical/cognitive/emotional stresses as well as relaxation”; page 16, A.6 Datasets Description; Note: Stress also includes environmental states.)
sensor states, (“The data were collected by C = 7 sensors, i.e., electrodermal activity, temperature, threedimensional acceleration, heart rate, and arterial oxygen level.”; page 16, A.6 Datasets Description)
locations, orientations, (“Note that they cover a wide variety of data size, dimensionality, subject scale, and class levels as well as sensor modalities including image, EEG, EMG, and electrocorticography (ECoG). See more detailed information of each dataset in Appendix A.6”; page 7, 4 Experimental Evaluation)
sampling rates, time and (“For each stress status, a corresponding task of 5 minutes long (i.e., T = 300 time samples with 1 Hz down-sampling) was assigned to subjects for a total of 4 trials.”; page 16, A.6 Datasets Description)
sensitivities. (“AutoBayes offers a solid reason of how to connect multiple DNN blocks to impose conditioning and adversary censoring for the task classifier, feature encoder, decoder, nuisance indicator and adversary networks, based on an explored Bayesian graph.;” page 3)
Regarding claim 15, Demir teaches wherein each of the reconfigurable DNN block further comprises hyperparameters specifying a set of layers having a set of artificial neuron nodes, wherein a pair of the neuron nodes from neighboring layers are mutually connected with a plural of trainable variables and activation functions to pass a signal from the previous layers to the next layers sequentially. (“AutoBayes Algorithm: The overall procedure of the AutoBayes algorithm is described in the pseudocode of Algorithm 1. The AutoBayes automatically constructs non-redundant inference factor graphs given a hypothetical Bayesian graph assumption, through the use of the Bayes-Ball algorithm. Depending on the derived conditional independency and pruned factor graphs, DNN blocks for encoder E, decoder D, classifier C, nuisance estimator N and adversary A are reasonably connected. The whole DNN blocks are trained with adversary learning in a variational Bayesian inference. Note that hyperparameters of each DNN block can be further optimized by AutoML on top of AutoBayes framework.”; page 3, 3 AutoBayes; Note: See Algorithm 1 on page 4 to see the procedure with the nodes.)
Regarding claim 16, Demir teaches wherein the nuisance variations S are further decomposed into multiple factors of variations S1, S2, . . . , SN as multiple-domain side information according to at least one of supervised, semi-supervised and unsupervised settings, wherein the latent variables are further decomposed into multiple factors of latent variables Z1, Z2, . . . , ZL as disentangled feature vectors. (“AutoBayes also enables learning disentangled representations, where the latent variable is split into multiple pieces to impose different relation with nuisance variation and task labels.”; page 1, Abstract and “At the core of our methodology is the consideration of graphical Bayesian models that capture the probabilistic relationship between random variables representing the data features X, task labels Y , nuisance variation labels S, and (potential) latent representations Z. The ultimate goal is to infer the task label Y from the measured data feature X, which is hindered by the presence of nuisance variations (e.g., inter-subject/session variations) that are (partially) labelled by S.”; page 2, 2 Key Contributions; Note: See page 4 Algorithm 1 to see multiple factors of variations S and latent variables Z)
Regarding claim 17, Demir teaches wherein the modifying hyperparameters employs at least one of reinforcement learning, evolutionary strategy, differential evolution, particle swarm, genetic algorithm, annealing, Bayesian optimization, hyperband, and multi-objective Lamarckian evolution, to explore different combinations of discrete and continuous hyperparameter values. (“AutoML: Searching DNN models with hyperparameter optimization has been intensively investigated in a framework called AutoML (Ashok et al., 2017; Brock et al., 2017; Cai et al., 2017; He et al., 2018; Miikkulainen et al., 2019; Real et al., 2017; 2020; Stanley
& Miikkulainen, 2002; Zoph et al., 2018). The automated methods include architecture
search (Zoph et al., 2018; Real et al., 2017; He et al., 2018; Real et al., 2020), learning rule
design (Bayer et al., 2009; Jozefowicz et al., 2015), and augmentation exploration (Cubuk
et al., 2019; Park et al., 2019). Most work used either evolutionary optimization or reinforcement learning framework to adjust hyperparameters or to construct network architecture from pre-selected building blocks. The recent AutoML-Zero (Real et al., 2020) considers an extension to preclude human knowledge and insights for fully automated designs from scratch.”; page 12, A.1 Related Work)
Regarding claim 18 Demir teaches wherein the set of hyperparameters compromises a set of training schedules including an adaptive control of learning rates, regularization weights, factorization permutations, and policy to prune less-priority links, by using a belief propagation to measure a discrepancy between the training data and the validation data. (“Model Implementation: All models were trained with a minibatch size of 32 and using the Adam optimizer with an initial learning rate of 0.001. The learning rate is halved whenever the validation loss plateaus.”; page 7 and “Apply the Bayes-Ball algorithm on B to build a conditional independency list I”; Algorithm 1, line 5)
Claim 19 is claim 1 in the form of a method and is rejected for the same reasons as claim 1 stated above.
Dependent claim 20 is claims 3 and 4 in the form of a method and is rejected for the same reasons as claims 3 and 4 stated above. For the rejection of the limitations specifically pertaining to the method of claim 19, see the rejection of claim 19 above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 12 rejected under 35 U.S.C. 103 as being unpatentable over Demir et al. (AutoBayes: Automated Bayesian Graph Exploration for Nuisance-Robust Inference); hereinafter Demir in view of Pandey et al. (Target-Independent Domain Adaptation for WBC Classification using Generative Latent Search); hereinafter Pandey
Claim 12 is rejected over Demir and Pandey with the incorporation of claim 1.
	Regarding claim 12, Demir teaches wherein the gradient method employs a combination of at least two of stochastic gradient descent, (“AutoBayes may automatically construct autoencoder architecture when latent variables are involved, e.g., for the model E in Fig. 5(e). For this case, Z represents a stochastic node to marginalize out for X reconstruction and Y inference, and hence VAE will be required. In contrast to vanilla autoencoders, VAE uses variational inference by assuming a marginal distribution”; page 14)
	Demir does not teach adaptive momentum, Ada gradient, Ada bound, Nesterov accelerated gradient, and root-mean-square propagation for optimizing trainable parameters of the reconfigurable DNN blocks.
	However, Pandey teaches adaptive momentum, Ada gradient, Ada bound, Nesterov accelerated gradient, and root-mean-square propagation for optimizing trainable parameters of the reconfigurable DNN blocks. (“The latent vector is optimized using a gradient-based optimization procedure, performed for K (a hyper-parameter) iterations over the latent space of the VAE for every target image. The gradient based optimization is implemented with Nesterov Accelerated Gradient method with a momentum of 0.5.”; page 5, B. Inference through Latent Search)
It would have been obvious before the effective filing date to combine the stochastic node with the Nesterov Accelerated Gradient of Pandey to effectively solve optimization problems (Pandey, page 5, B. Inference through Latent Search). Demir and Pandey are analogous art because they both concern domain disentanglement.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID H TRAN whose telephone number is (703)756-1525. The examiner can normally be reached M-F 9:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/DAVID H TRAN/Examiner, Art Unit 2147                                                                                                                                                                                                        /VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
Prosecution Timeline

Feb 01, 2022
Application Filed
Nov 06, 2025
Non-Final Rejection — §102, §103, §112
Feb 19, 2026
Applicant Interview (Telephonic)
Feb 19, 2026
Examiner Interview Summary
Feb 24, 2026
Response Filed
Mar 25, 2026
Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/571,542
Patent 12579404
PROCESSOR FOR NEURAL NETWORK, PROCESSING METHOD FOR NEURAL NETWORK, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
14%
Grant Probability
38%
With Interview (+23.2%)
4y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allow rate.