Last updated: April 19, 2026
Application No. 17/360,613
MECHANISTIC MODEL PARAMETER INFERENCE THROUGH ARTIFICIAL INTELLIGENCE

Non-Final OA §103
Filed
Jun 28, 2021
Examiner
DEVORE, CHRISTOPHER DILLON
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
5 (Non-Final)
Interview Optional

— +41.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 10 resolved cases, 2023–2026
Examiner Intelligence

DEVORE, CHRISTOPHER DILLON View full profile →
Grants 50% of resolved cases
Career Allow Rate
5 granted / 10 resolved
-5.0% vs TC avg
Strong +42% interview lift
Without
With
+41.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
30.1%
-9.9% vs TC avg
§103
39.0%
-1.0% vs TC avg
§102
7.7%
-32.3% vs TC avg
§112
21.4%
-18.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 10 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/02/2026 has been entered.
 
Response to Arguments
Remarks page 7, Applicant contends:
	Claim 16 is amended to overcome the claim objection.
Response:
	Claim 16 seems clarified, thus the claim objection is removed.

Remarks pages 7-8, Applicant contends:
	Claims 1 and 11 have been amended, and the currently recited prior art does not teach the elements as amended herein.
Response:
	Applicant’s arguments with respect to claim(s) 1 and 11 have been considered but are moot because the new ground of rejection contain elements that have not been previously examined or does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 

Remarks page 9, Applicant contends:
	Claim 16, as amended, is not taught by current art.
Response:
	Applicant’s arguments with respect to claim(s) 16 have been considered but are moot because the new ground of rejection contain elements that have not been previously examined or does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-8, 10-11, 13-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Meeds et al (US 20200233920), referred to as Meeds in this document, in view of Wang et al (“Deep generative modeling for mechanistic-based learning and design of metamaterial systems”), referred to as Wang in this document, and even further in view of Huang et al (“Augmented Normalizing Flows”), referred to as Huang in this document, and even further in view of Sun et al (“A Convolutional Neural Network Model Based on Improved Softplus Activation Function”), referred to as Sun in this document.
Regarding Claim 1:
Meeds teaches:
A system, comprising: a memory that stores computer executable components; and a processor, operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components comprise:
([Meeds 0207]: “According to another aspect, there is provided computer system comprising storage and one or more processors, the storage storing code arranged to run on at least one of the processors, wherein the code is configured so as when run on the at least one processor to perform operations in accordance with any of the methods disclosed herein.”) [A system, comprising: a memory that stores computer executable components; and a processor, operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components comprise]

a machine learning component that: integrates a mechanistic model of a biological system with an artificial intelligence model for identification of hidden mechanistic causes of observed data, wherein the mechanistic model of one or more mechanistic models is a biophysical model of the biological system
([Meeds 0162]: “The mechanistic model describes the time-evolution of the response of double receiver devices to HSL signals C6 and C12, vector u in (1). [a machine learning component that: integrates a mechanistic model of a biological system… wherein the mechanistic model of one or more mechanistic models is a biophysical model of the biological system] The latent variables x in (1) are the colony density c, the intracellular concentrations of each expressed protein (luxR, lasR, RFP, CFP, YFP) and variables for autofluorescence, which are modelled as concentration of intracellular material fluorescent at 480 nm (F480) and 530 nm (F530). A system of differential equations was derived from chemical reactions (Dalchau et al., 2019), but several assumptions were made to simplify the model, including the removal of mRNA species.”)
([Meeds 0004]: “Using a machine learning algorithm in the form of a variational autoencoder (VAE) would address some of these issues. A variational autoencoder is a form of machine learning algorithm comprising an encoder and a decoder, each comprising a neural network. The encoder receives input observations and encodes them into a compressed representation called a latent vector [with an artificial intelligence model for identification]. The decoder is arranged to decode the latent vector back into values in the real-world feature space. In the learning phase, the values output by the decoder are compared to the observations in the in experience data input to the encoder, and the neural nets of the encoder and decoder are trained to minimize a measure of overall difference between the input observations and the output of the decoder. “)
([Meeds 0080]: “The advantage of a white or mechanistic approach is that it enables an interpretation [of hidden mechanistic causes of observed data as the system is mechanistic, thus is learning mechanistic causes] to be placed on the learned parts of the equation. If the modelled form of the equation is accurate, it may also lead to more accurate results.”)

identifies a causal relationship in the mechanistic model via a machine learning architecture that employs a parameter space of the mechanistic model as a latent space of a variational autoencoder (VAE)
([Meeds 0004]: “Using a machine learning algorithm in the form of a variational autoencoder (VAE) would address some of these issues. A variational autoencoder is a form of machine learning algorithm comprising an encoder and a decoder, each comprising a neural network. The encoder receives input observations and encodes them into a compressed representation called a latent vector. The decoder is arranged to decode the latent vector back into values in the real-world feature space. [a machine learning architecture that employs a parameter space of the mechanistic model as a latent space of a variational autoencoder (VAE)] In the learning phase, the values output by the decoder are compared to the observations in the in experience data input to the encoder, and the neural nets of the encoder and decoder are trained to minimize a measure of overall difference between the input observations and the output of the decoder. “)
Paragraphs 75 to 77 note different box approaches from black, to grey, to white. These approaches are variations of the decoder for the VAE. ([Meeds 0075]: “In some embodiments, the decoder 404 comprises only one big neural network (the second neural network 405) modelling the whole right-hand side f of the differential equation. This is referred to herein as the “black box” approach.”)
([Meeds 0080]: “The advantage of a white or mechanistic approach is that it enables an interpretation to be placed on the learned parts of the equation. If the modelled form of the equation is accurate, it may also lead to more accurate results.”) [identifies a causal relationship in the mechanistic model]

wherein the machine learning component comprises the VAE that employs the one or more mechanistic models as a decoder node
Paragraphs 75 to 77 note different box approaches from black, to grey, to white. These approaches are variations of the decoder for the VAE. ([Meeds 0075]: “In some embodiments, the decoder 404 comprises only one big neural network (the second neural network 405) modelling the whole right-hand side f of the differential equation. This is referred to herein as the “black box” approach.”)
([Meeds 0080]: “The advantage of a white or mechanistic approach [wherein the machine learning component comprises the VAE that employs the one or more mechanistic models as a decoder node] is that it enables an interpretation to be placed on the learned parts of the equation. If the modelled form of the equation is accurate, it may also lead to more accurate results.”)

Meeds does not explicitly teach:
and learned distributions sampled within generative adversarial networks
and a normalizing flow,
and a plurality of neural network layers that implement the normalizing flow
and the VAE comprises one or more bijector nodes…, the output of at least one of the neural network layers coupled to at least one of the bijector nodes,
Meeds does not explicitly depict bijector nodes, but Meeds does teach aspects of prior distributions, such as in paragraph 104 noting “a mean-field Gaussian prior distribution”. This is noted for possible relevance.
wherein the one or more bijector nodes comprises one or more rotation transformations and incorporates one or more softplus functions

Wang teaches:
and learned distributions sampled within generative adversarial networks
[Wang Introduction page 2]: “In contrast, deep generative models, such as generative adversarial networks (GAN) [34] and variational autoencoder (VAE) [35], aim to learn the underlying structure [and learned distributions sampled within generative adversarial networks] of a large dataset to enable the generation of new designs from a low-dimensional latent space. In the area of material design, deep generative models had been applied to the microstructure characterization and reconstruction of nanomaterials and alloys [36], [37], design of material microstructure morphologies [38], heat conduction materials [39], and design of photonic/phononic metamaterials [40], [41], [42], [43]. Despite using different neural network architectures, these applications follow a similar design framework by using the latent vectors of the generative model as reduced-dimensional design variables for metamaterials. Combined with a trained predictive model, optimization on the latent space is performed to efficiently explore the high-dimensional or intractable geometric design spaces.” 
Wang notes that both VAE and GANs are utilized for the predictive or compressive nature of latent or learned distributions. Wang also notes the reconstructive nature and the use with an already trained model ([Wang Introduction page 2]: “Combined with a trained predictive model, optimization on the latent space is performed to efficiently explore the high-dimensional or intractable geometric design spaces.”). Figure 4 of current application and figure 1 of Wang show the same premise of encoder input and decoder input needing to be similar, showing that the reference has similar premise.

One of ordinary skill, prior to the effective filing date, would have been motivated to combine Meeds and Wang to utilize generative adversarial networks. Meeds and Wang are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Meeds and Wang to incorporate generative adversarial networks, as GANs can be used to optimize the latent space for efficiently exploring high-dimensional or intractable geometric design spaces ([Wang Introduction page 2]: “Despite using different neural network architectures, these applications follow a similar design framework by using the latent vectors of the generative model as reduced-dimensional design variables for metamaterials. Combined with a trained predictive model, optimization on the latent space is performed to efficiently explore the high-dimensional or intractable geometric design spaces.”).

Huang teaches:
and a normalizing flow,
and a plurality of neural network layers that implement the normalizing flow
([Huang Introduction Page 2]: “Theoretically, we show that the family of [Augmented Normalizing Flows] [and a normalizing flow][and a plurality of neural network layers that implement the normalizing flow] with additive coupling can universally transform arbitrary data distribution into a standard Gaussian prior, augmented with a degenerate deterministic variable. To the best of our knowledge, this is the first attempt in understanding how expressivity can be improved via composing flow layers rather than widening the flow (Huang et al., 2018).”)
One of ordinary skill I the art, prior to the effective filing date, would have been motivated to combine Meeds and Huang to include autoregressive or normalizing flow algorithms for transforming to a prior distribution. Meeds and Huang are of the same field of endeavor, as they are both in machine learning. One of ordinary skill would have been motivated to combine Meeds and Huang in order to improve expressivity without widening the flow ([Huang Introduction Page 2]: “Theoretically, we show that the family of ANFs with additive coupling can universally transform arbitrary data distribution into a standard Gaussian prior, augmented with a degenerate deterministic variable. To the best of our knowledge, this is the first attempt in understanding how expressivity can be improved via composing flow layers rather than widening the flow (Huang et al., 2018).”).

and the VAE comprises one or more bijector nodes…, the output of at least one of the neural network layers coupled to at least one of the bijector nodes
([Huang Background 1. Invertible Generative Models]: “Assume y ~ N(0, I). Assume the data is generated via a bijective mapping                         
                            x
                            =
                            
                                    f
                                
                                    θ
                                
                                    y
                                
                    . Then the probability density function of                         
                            
                                    f
                                
                                    θ
                                
                                    y
                                
                     evaluated at x can be written as… Equivalently, one can parameterize the inverse transformation                         
                            
                                    x
                                    →
                                    g
                                
                                    θ
                                
                                    x
                                
                     with invertible mapping                         
                            
                                    g
                                
                                    θ
                                
                    , [and the VAE comprises one or more bijector nodes…, the output of at least one of the neural network layers coupled to at least one of the bijector nodes] and define the generative transformation as                         
                            
                                    f
                                
                                    θ
                                
                            =
                            
                                            g
                                        
                                            θ
                                        
                                    -
                                    1
                                
                    . Much of the design effort has been dedicated to ensuring (1) the invertibility of the transformation g, and (2) efficiency in computing the log-determinant of the Jacobian in Equation 2.”)

wherein the one or more bijector nodes comprises one or more rotation transformations
[Huang B.3 page 15]: "We apply a split operator to this last layer to obtain a ‘shift’ coefficient and ‘log scale’ coefficient for affine transformation [wherein the one or more bijector nodes comprises one or more rotation transformations where this quote is showing affine transformations are a known example of transformations]."
Where affine transformations are known to include rotations is supported by [Wikipedia “Affine transformation” page 1]: "Examples of affine transformations include translation, scaling, homothety, similarity, reflection, rotation, shear mapping, and compositions of them in any combination and sequence."

One of ordinary skill I the art, prior to the effective filing date, would have been motivated to combine Meeds and Huang to include a bijector node or equivalent mapping for transforming to a prior distribution. Meeds and Huang are of the same field of endeavor, as they are both in machine learning. One of ordinary skill would have been motivated to combine Meeds and Huang in order to improve expressivity ([Huang C. Extended related work and future direction]: “The term Normalizing Flow was originally coined by Tabak et al. (2010); Tabak & Turner (2013) where it was used for density estimation. Differentiable bijective models were first introduced to the deep learning community as likelihood-based generative models by Rippel & Adams (2013); Dinh et al. (2014), and as an inference machine by Rezende & Mohamed (2015). Most development within this line of research is dedicated to improving the expressivity of the bijective mapping while maintaining computational tractability of the log-determinant of the Jacobian. Each family of flows can be characterized by the ‘trick’ used to achieve this, e.g.” and shown that the paper’s purpose is just that [Huang Abstract]: “In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation of a lower bound on the likelihood.”).

Sun teaches:
and incorporates one or more softplus functions
[Sun 2.2 Softplus page 3]: "The Softplus activation function [and incorporates one or more softplus functions] is an approximate smoothed version of the ReLU activation function proposed by Glorot et al. [8] in 2011. It is non-linear and has a continuous differentiable function in the domain, and the change is relatively flat. Compared with the Sigmoid activation function, the principle of biological neuron signal activation is more consistent, and the disadvantages of the ReLU activation function due to forced sparsity are also avoided. The function curve of the Softplus activation function is indicated by the black dotted line in Fig. 1."
One of ordinary skill in the art, prior to the effective filing date, would have ben motivated to combine Meeds and Sun to incorporate softplus functions. Meeds and Sun are in the same field of endeavor of machine learning. One of ordinary skill would have been motivated to combine Meeds and Sun in order to utilize the advantages of softplus such as the neuron signal activation is more consistent and overcomes some disadvantages of ReLU ([Sun 2.2 Softplus page 3]: "The Softplus activation function is an approximate smoothed version of the ReLU activation function proposed by Glorot et al. [8] in 2011. It is non-linear and has a continuous differentiable function in the domain, and the change is relatively flat. Compared with the Sigmoid activation function, the principle of biological neuron signal activation is more consistent, and the disadvantages of the ReLU activation function due to forced sparsity are also avoided. The function curve of the Softplus activation function is indicated by the black dotted line in Fig. 1.").
The use of activation functions, such as softplus functions, would be a conventional modification as Huang notes the use of activation functions in Huang’s systems ([Huang B.3 page 15]:"We also apply activiation normalization (Kingma & Dhariwal, 2018) with data-dependent initialization that standardizes the transformed feature, after each encoding transform and each decoding transform."), thus with the motivation to utilize softplus the activation function in Huang or the modified system of Meeds can be the softplus function.

Regarding Claim 2:
		The system of claim 1 is taught by Meeds, Wang, Huang, Sun.
		Meeds teaches:
wherein the mechanistic model is a decoder of the variational autoencoder
Paragraphs 75 to 77 note different box approaches from black, to grey, to white. These approaches are variations of the decoder for the VAE. ([Meeds 0075]: “In some embodiments, the decoder 404 comprises only one big neural network (the second neural network 405) modelling the whole right-hand side f of the differential equation. This is referred to herein as the “black box” approach.”)
([Meeds 0080]: “The advantage of a white or mechanistic approach [wherein the mechanistic model is a decoder of the variational autoencoder] is that it enables an interpretation to be placed on the learned parts of the equation. If the modelled form of the equation is accurate, it may also lead to more accurate results.”)
	
Regarding Claim 3:
		The system of claim 1 is taught by Meeds, Wang, Huang, Sun.
		Meeds teaches:
		wherein the variational autoencoder determines a conditional probability associated with the parameter space based on an output of the mechanistic model.
		([Meeds 0153]: “Despite surface similarities, the modeling regime and motivations of the prior work is markedly different from embodiments of the invention in that a block-conditional variational distribution over the parameters of a system of ODEs (in the white box case) is learned [wherein the variational autoencoder determines a conditional probability associated with the parameter space based on an output of the mechanistic model as white box is noted by paragraph 77 to be a mechanistic decoder], or over a hierarchical factorization of the latent variables (in the black box case, guided by domain expertise. The variational distribution learned is over the parameters of this system, rather than the latent state itself.”)

Regarding Claim 4:
		The system of claim 1 is taught by Meeds, Wang, Huang, Sun.
		Meeds teaches:
		wherein the machine learning architecture approximates a distribution of the parameter space that is consistent with a single output of the mechanistic model
		([Meeds 0086]: “Some embodiments provide an efficient method for learning ODE model parameters from experimental data that also supports grey-box modelling, such that ODE models of previously unknown biological mechanisms can be inferred from biological experiments in a systematic manner. The method relies on reparameterized variational inference, and performs scalable Bayesian learning by leveraging fully differentiable inference through a variational autoencoder. This enables fast parameter inference [wherein the machine learning architecture approximates a distribution of the parameter space that is consistent with a single output of the mechanistic model] while parsimoniously trading off between model fit and generalisability. The generator may be a modified-Euler solution to an ODE whose parameters are samples from the variational distribution. Some embodiments involve backpropagation directly through the ODE solver during learning. Within the same framework, prescribed white-box ODE models are interchangeably compared with carefully constrained black-box ODE models—and as such could be viewed more as grey-box models—that use neural-network sub-components to define ODE RHS's. By doing so, this may support scientists wishing to improve model performance while maintain a useful degree of interpretability over their models. Scientists can therefore focus on scaling inference and experimental design, rather than the precise details of mechanistic models, accelerating research in their specific domain.”)

Regarding Claim 5:
		The system of claim 1 is taught by Meeds, Wang, Huang, Sun.
		Meeds teaches:
		further comprising: a training component that trains the variational autoencoder by sampling an output of the mechanistic model as a training input for the variational autoencoder, wherein the parameter space is known.
		([Meeds 0004]: “In the learning phase, the values output by the decoder are compared to the observations in the in experience data input to the encoder, [a training component that trains the variational autoencoder by sampling an output of the mechanistic model as a training input for the variational autoencoder, wherein the parameter space is known.] and the neural nets of the encoder and decoder are trained to minimize a measure of overall difference between the input observations and the output of the decoder.”) The decoder is mentioned to be mechanistic in paragraph 77.

Regarding Claim 6:
The system of claim 1 is taught by Meeds, Wang, Huang, Sun.
Meeds teaches:
further comprising: a training component that trains the variational autoencoder
([Meeds 0004]: “The decoder is arranged to decode the latent vector back into values in the real-world feature space. In the learning phase, the values output by the decoder are compared to the observations in the in experience data input to the encoder, and the neural nets of the encoder and decoder are trained [training component that trains the variational autoencoder] to minimize a measure of overall difference between the input observations and the output of the decoder. Hence the VAE learns to encode observations into a compressed latent vector and to decode back again.”)
Meeds does not explicitly teach:
by constructing a joint probability in the form of two machine learning networks
Huang teaches:
by constructing a joint probability in the form of two machine learning networks (joint probability as two machine learning networks is interpreted for examination as a joint probability between two machine learning networks)

    PNG
    media_image1.png
    433
    1265
    media_image1.png
    Greyscale

		Figure 1 in Huang [by constructing a joint probability in the form of two machine learning networks]

		Additional note for ensuring proof of joint probability in Huang ([Huang 2. Variational Autoencoders]: ”VAEs allow one to embed the data in another space (usually of lower dimensionality), and w generate via an arbitrarily parameterized mapping. However, the log likelihood of the data is no longer tractable, so we can only maximize an approximate log likelihood. The performance of the model highly depends on the choice of the encoding distribution and the decoding distribution, as they are closely related to the tightness of the lower bound (Cremer et al 2018).”) in reference to the current disclosure ([Current Disclosure 0063]: “The training component 202 can train just the encoder distribution Px|y’, (x|y’), by constructing a joint probability in accordance with Equation 7 below. Px|y’ (x|y’) = Px|y’,(x|y’)Py,(y’) For example, the joint probability can be in the form of two deep learning networks, where the log likelihood of the network parameters can be maximized for samples from the prior parameter distribution and correspondingly generated from Y’.”).
		
One of ordinary skill I the art, prior to the effective filing date, would have been motivated to combine Meeds and Huang to utilize joint probability with two machine learning networks. Meeds and Huang are of the same field of endeavor, as they are both in machine learning. One of ordinary skill would have been motivated to combine Meeds and Huang in order to improve structure of the distribution ([Huang 3. Augmented Maximum Likelihood]: “The benefit of maximizing the joint likelihood is that it allows us to make use of the augmented state space to induce structure on the marginal distribution of x in the original input space.”).

Regarding Claim 7:
The system of claim 1 is taught by Meeds, Wang, Huang, Sun.
Meeds does not explicitly teach:
wherein the latent space has a multivariate Gaussian distribution, and wherein the machine learning architecture includes a bijector node that transforms the multivariate Gaussian distribution to a prior distribution of parameters of the mechanistic model.
Huang teaches:
wherein the latent space has a multivariate Gaussian distribution, and wherein the machine learning architecture includes a bijector node that transforms the multivariate Gaussian distribution to a prior distribution of parameters of the mechanistic model.
([Huang Background 2. Variational Autoencoders]: “Learning and inference can be jointly achieved by drawing a stochastic estimate of the gradient of the ELBO via reparameterization… Conventionally, qᵩ(z|x) is a multivariate Gaussian distribution [wherein the latent space has a multivariate Gaussian distribution] with diagonal covariance.”)
([Huang Background 1. Invertible Generative Models]: “Assume y ~ N(0, I). Assume the data is generated via a bijective mapping                         
                            x
                            =
                            
                                    f
                                
                                    θ
                                
                                    y
                                
                    . Then the probability density function of                         
                            
                                    f
                                
                                    θ
                                
                                    y
                                
                     evaluated at x can be written as… Equivalently, one can parameterize the inverse transformation                         
                            
                                    x
                                    →
                                    g
                                
                                    θ
                                
                                    x
                                
                     with invertible mapping                         
                            
                                    g
                                
                                    θ
                                
                    , [wherein the machine learning architecture includes a bijector node that transforms the multivariate Gaussian distribution to a prior distribution of parameters of the mechanistic model] and define the generative transformation as                         
                            
                                    f
                                
                                    θ
                                
                            =
                            
                                            g
                                        
                                            θ
                                        
                                    -
                                    1
                                
                    . Much of the design effort has been dedicated to ensuring (1) the invertibility of the transformation g, and (2) efficiency in computing the log-determinant of the Jacobian in Equation 2.”)

The motivation to combine with Huang is the same motivation from claim 1 for combining with Huang to use bijector nodes.

Regarding Claim 8:
The system of claim 1 is taught by Meeds, Huang, Sun.
Meeds does not explicitly teach:
wherein the machine learning architecture employs an autoregressive or normalizing flow algorithm that transforms a base distribution of latent parameters to a prior distribution of mechanistic model parameters.
Huang teaches:
wherein the machine learning architecture employs an autoregressive or normalizing flow algorithm that transforms a base distribution of latent parameters to a prior distribution of mechanistic model parameters.
([Huang Introduction Page 2]: “Theoretically, we show that the family of [Augmented Normalizing Flows] with additive coupling can universally transform arbitrary data distribution into a standard Gaussian prior, [wherein the machine learning architecture employs an autoregressive or normalizing flow algorithm that transforms a base distribution of latent parameters to a prior distribution of mechanistic model parameters] augmented with a degenerate deterministic variable. To the best of our knowledge, this is the first attempt in understanding how expressivity can be improved via composing flow layers rather than widening the flow (Huang et al., 2018).”)

The motivation to combine with Huang is the same motivation to combine with Huang in claim 1 for the combination involving normalizing flow.

Regarding Claim 10:
		The system of claim 9 is taught by Meeds, Wang, Huang, Sun.
		Meeds teaches:
		wherein the parameter space characterizes observations of the biological system.
		([Meeds 0085]: “Furthermore, in many cases the precise biological mechanism is only partially known, such that manually formulating an appropriate ODE model that is consistent with biological experiments remains a challenge. As a result, there is a growing need not only to learn ODE model parameters [wherein the parameter space characterizes observations of the biological system] for known biological mechanisms, but also to learn the underlying mechanisms themselves in a systematic manner.”)

Regarding Claim 11:
		Claim 11 is analogous to claim 1.

Regarding Claim 13:
		The method of claim 11 is taught by Meeds, Wang, Huang, Sun.
Claim 13 is analogous to claim 4.

Regarding Claim 14:
		The method of claim 11 is taught by Meeds, Wang, Huang, Sun.
Claim 14 is analogous to claim 5.

Regarding Claim 15:
The method of claim 14 is taught by Meeds, Wang, Huang, Sun.
Claim 15 is analogous to claim 6.

Claims 16-19 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Meeds et al (US 20200233920), referred to as Meeds in this document, and even further in view of Huang et al (“Augmented Normalizing Flows”), referred to as Huang in this document, and even further in view of Yakut et al (US 20230045548 A1), referred to as Yakut in this document, and even further in view of Sun et al (“A Convolutional Neural Network Model Based on Improved Softplus Activation Function”), referred to as Sun in this document.
Regarding Claim 16:
Meeds teaches:
integrate, by the processor, a mechanistic model of one or more mechanistic models with an artificial intelligence model for identification of hidden mechanistic causes of observed data, 
wherein the mechanistic model of one or more mechanistic models is a biophysical model of a biological system
([Meeds 0162]: “The mechanistic model describes the time-evolution of the response of double receiver devices to HSL signals C6 and C12, vector u in (1). [integrate, by the processor, a mechanistic model of one or more mechanistic models… wherein the mechanistic model of one or more mechanistic models is a biophysical model of a biological system] The latent variables x in (1) are the colony density c, the intracellular concentrations of each expressed protein (luxR, lasR, RFP, CFP, YFP) and variables for autofluorescence, which are modelled as concentration of intracellular material fluorescent at 480 nm (F480) and 530 nm (F530). A system of differential equations was derived from chemical reactions (Dalchau et al., 2019), but several assumptions were made to simplify the model, including the removal of mRNA species.”)
([Meeds 0004]: “Using a machine learning algorithm in the form of a variational autoencoder (VAE) would address some of these issues. A variational autoencoder is a form of machine learning algorithm comprising an encoder and a decoder, each comprising a neural network. The encoder receives input observations and encodes them into a compressed representation called a latent vector [with an artificial intelligence model for identification]. The decoder is arranged to decode the latent vector back into values in the real-world feature space . In the learning phase, the values output by the decoder are compared to the observations in the in experience data input to the encoder, and the neural nets of the encoder and decoder are trained to minimize a measure of overall difference between the input observations and the output of the decoder. “)
([Meeds 0080]: “The advantage of a white or mechanistic approach is that it enables an interpretation [of hidden mechanistic causes of observed data as the system is mechanistic, thus is learning mechanistic causes] to be placed on the learned parts of the equation. If the modelled form of the equation is accurate, it may also lead to more accurate results.”)

identify, by the processor, a causal relationship in the mechanistic model via a machine learning architecture that employs a parameter space that characterizes observations of the system and of the mechanistic model as a latent space of a variational autoencoder (VAE),
([Meeds 0080]: “The advantage of a white or mechanistic approach is that it enables an interpretation to be placed on the learned parts of the equation. If the modelled form of the equation is accurate, it may also lead to more accurate results.”) [identify… a causal relationship in the mechanistic model via a machine learning architecture]
Paragraphs 75 to 77 note different box approaches from black, to grey, to white. These approaches are variations of the decoder for the VAE. ([Meeds 0075]: “In some embodiments, the decoder 404 comprises only one big neural network (the second neural network 405) modelling the whole right-hand side f of the differential equation. This is referred to herein as the “black box” approach.”)
([Meeds 0207]: “According to another aspect, there is provided computer system comprising storage and one or more processors, the storage storing code arranged to run on at least one of the processors, wherein the code is configured so as when run on the at least one processor [by the processor] to perform operations in accordance with any of the methods disclosed herein.”)
([Meeds 0004]: “Using a machine learning algorithm in the form of a variational autoencoder (VAE) would address some of these issues. A variational autoencoder is a form of machine learning algorithm comprising an encoder and a decoder, each comprising a neural network. The encoder receives input observations and encodes them into a compressed representation called a latent vector. The decoder is arranged to decode the latent vector back into values in the real-world feature space. [that employs a parameter space that characterizes observations of the system and of the mechanistic model as a latent space of a variational autoencoder] In the learning phase, the values output by the decoder are compared to the observations in the in experience data input to the encoder, and the neural nets of the encoder and decoder are trained to minimize a measure of overall difference between the input observations and the output of the decoder. “)
([Meeds 0085]: “Furthermore, in many cases the precise biological mechanism is only partially known, such that manually formulating an appropriate ODE model that is consistent with biological experiments remains a challenge. As a result, there is a growing need not only to learn ODE model parameters [that characterizes observations of the system and] for known biological mechanisms, but also to learn the underlying mechanisms themselves in a systematic manner.”)

and wherein one or more model parameters for the mechanistic model comprise: concentrations of therapeutic compounds, and external mechanical stimuli or electrical stimuli,
([Meeds 0162]: “The mechanistic model describes the time-evolution of the response of double receiver devices to HSL signals C6 and C12, vector u in (1). The latent variables x in (1) are the colony density c, the intracellular concentrations of each expressed protein [and wherein one or more model parameters for the mechanistic model comprise: concentrations of therapeutic compounds, and external mechanical stimuli or electrical stimuli] (luxR, lasR, RFP, CFP, YFP) and variables for autofluorescence, which are modelled as concentration of intracellular material fluorescent at 480 nm (F480) and 530 nm (F530). A system of differential equations was derived from chemical reactions (Dalchau et al., 2019), but several assumptions were made to simplify the model, including the removal of mRNA species.”)

wherein the machine learning architecture comprises the VAE that employs the one or more mechanistic models as a decoder node
Paragraphs 75 to 77 note different box approaches from black, to grey, to white. These approaches are variations of the decoder for the VAE. ([Meeds 0075]: “In some embodiments, the decoder 404 comprises only one big neural network (the second neural network 405) modelling the whole right-hand side f of the differential equation. This is referred to herein as the “black box” approach.”)
([Meeds 0080]: “The advantage of a white or mechanistic approach [wherein the machine learning architecture comprises the VAE that employs the one or more mechanistic models as a decoder node] is that it enables an interpretation to be placed on the learned parts of the equation. If the modelled form of the equation is accurate, it may also lead to more accurate results.”)

Meeds does not explicitly teach:
A computer program product for autonomous model parameter inference, the computer program product comprising a computer readable storage medium having program instructions embodied therewith,
and a normalizing flow, 
and a plurality of neural network layers that implement the normalizing flow
and the VAE comprises one or more bijector nodes… the output of at least one of the neural network layers coupled to at least one of the bijector nodes
wherein the one or more bijector nodes incorporates one or more softplus function and shift and scale layers

Huang teaches:
and a normalizing flow, 
and a plurality of neural network layers that implement the normalizing flow
([Huang Introduction Page 2]: “Theoretically, we show that the family of [Augmented Normalizing Flows] [and a normalizing flow][ and a plurality of neural network layers that implement the normalizing flow] with additive coupling can universally transform arbitrary data distribution into a standard Gaussian prior, augmented with a degenerate deterministic variable. To the best of our knowledge, this is the first attempt in understanding how expressivity can be improved via composing flow layers rather than widening the flow (Huang et al., 2018).”)
The motivation to combine with Huang is the same as the motivation in claim 1 to combine with Huang for normalizing flow.

and the VAE comprises one or more bijector nodes… the output of at least one of the neural network layers coupled to at least one of the bijector nodes
([Huang Background 1. Invertible Generative Models]: “Assume y ~ N(0, I). Assume the data is generated via a bijective mapping                         
                            x
                            =
                            
                                    f
                                
                                    θ
                                
                                    y
                                
                    . Then the probability density function of                         
                            
                                    f
                                
                                    θ
                                
                                    y
                                
                     evaluated at x can be written as… Equivalently, one can parameterize the inverse transformation                         
                            
                                    x
                                    →
                                    g
                                
                                    θ
                                
                                    x
                                
                     with invertible mapping                         
                            
                                    g
                                
                                    θ
                                
                    , [and the VAE comprises one or more bijector nodes… the output of at least one of the neural network layers coupled to at least one of the bijector nodes] and define the generative transformation as                         
                            
                                    f
                                
                                    θ
                                
                            =
                            
                                            g
                                        
                                            θ
                                        
                                    -
                                    1
                                
                    . Much of the design effort has been dedicated to ensuring (1) the invertibility of the transformation g, and (2) efficiency in computing the log-determinant of the Jacobian in Equation 2.”)
wherein the one or more bijector nodes incorporates one or more softplus function and shift and scale layers
[Huang B.3 page 15]: "We apply a split operator to this last layer to obtain a ‘shift’ coefficient and ‘log scale’ coefficient for affine transformation [and shift and scale layers where this quote is showing affine transformations are a known example of transformations]."
Where affine transformations are known to include shift and scale is supported by [Wikipedia “Affine transformation” page 1]: "Examples of affine transformations include translation, scaling, homothety, similarity, reflection, rotation, shear mapping, and compositions of them in any combination and sequence."
The motivation to combine with Huang is the same as the motivation in claim 1 to combine with Huang for bijector nodes.

Yakut teaches:
A computer program product for autonomous model parameter inference, the computer program product comprising a computer readable storage medium having program instructions embodied therewith,
	([Yakut 0204]: “The present techniques may be implemented as a system, a method, and/or a computer program product [A computer program product for autonomous model parameter inference]. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon [computer program product comprising a computer readable storage medium having program instructions embodied therewith] for causing a processor to carry out aspects of the present disclosure.”)

One of ordinary skill I the art, prior to the effective filing date, would have been motivated to combine Meeds and Yakut to incorporate a computer readable storage medium. Meeds and Yakut are of the same field of endeavor, as they are both in machine learning. One of ordinary skill would have been motivated to combine Meeds and Yakut, as the incorporation of a computer readable storage medium enables the ability to distribute/sell the code of the product for other computers to use ([Yakut 0204]: “The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.”).

Sun teaches:
wherein the one or more bijector nodes incorporates one or more softplus function
[Sun 2.2 Softplus page 3]: "The Softplus activation function [wherein the one or more bijector nodes incorporates one or more softplus function] is an approximate smoothed version of the ReLU activation function proposed by Glorot et al. [8] in 2011. It is non-linear and has a continuous differentiable function in the domain, and the change is relatively flat. Compared with the Sigmoid activation function, the principle of biological neuron signal activation is more consistent, and the disadvantages of the ReLU activation function due to forced sparsity are also avoided. The function curve of the Softplus activation function is indicated by the black dotted line in Fig. 1."
The motivation to combine with Sun is the same motivation presented to combine with Sun in claim 1.
The use of activation functions, such as softplus functions, would be a conventional modification as Huang notes the use of activation functions in Huang’s systems ([Huang B.3 page 15]:"We also apply activiation normalization (Kingma & Dhariwal, 2018) with data-dependent initialization that standardizes the transformed feature, after each encoding transform and each decoding transform."), thus with the motivation to utilize softplus the activation function in Huang or the modified system of Meeds can be the softplus function.

Regarding Claim 17:
The computer program product of claim 16 is taught by Meeds, Huang, Yakut, Sun.
Claim 17 is analogous to claim 2.

Regarding Claim 18:
The computer program product of claim 16 is taught by Meeds, Huang, Yakut, Sun.
Claim 18 is analogous to claim 3.

Regarding Claim 19:
The computer program product of claim 16 is taught by Meeds, Huang, Yakut, Sun.
Claim 19 is analogous to claim 8.

Regarding Claim 21:
The computer program product of claim 16 is taught by Meeds, Huang, Yakut, Sun.
Yakut teaches:
wherein a one of the one or more model parameters comprises temperature
[Yakut 0012]: “In some examples, the generative model may be configured to reproduce the relation between more than two modalities. For example, one or more condition parameters (e.g. temperature [wherein a one of the one or more model parameters comprises temperature], pressure, and flow rate) may be aggregated in a first modality. Raw material quality, which may be represented by multiple variables, may be aggregated in a second modality. The third modality may be one KPI.”

One of ordinary skill I the art, prior to the effective filing date, would have been motivated to combine Meeds and Yakut to incorporate temperature for a parameter. Meeds and Yakut are of the same field of endeavor, as they are both in machine learning. One of ordinary skill would have been motivated to combine Meeds and Yakut, as the incorporation of temperature for a parameter enables the ability to distinguish whether an item, like chemicals, are in a safe range or are deviating to an unwanted state which could be bad for the item ([Yakut 0023]: “In other words, the one or more KPIs may comprise parameters that are measured directly using a sensor, e.g., a temperature sensor or a pressure sensor. The one or more KPIs may alternatively or additionally comprise parameters that are obtained indirectly through proxy variables. For example, while catalyst activity is not measured directly in process data, it manifests itself in reduced yield and/or conversion of the process. The one or more KPIs may be defined by a user (e.g. process operator) or by a statistical model e.g. an anomaly score measuring the distance to the “healthy” state of the equipment in a multivariate space of relevant process and/or storage condition data, such as the Hotelling T.sup.2 score or the DModX distance derived from principal component analysis (PCA). Here, the healthy state may refer to the bulk of states that are typically observed during periods in the historic process and/or storage condition data that were labelled as “usual”/“unproblematic”/“good” by an expert for the production process.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Mescheder et al (“Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks”) is considered relevant art as the reference goes over the use of variational autoencoders and generative adversarial networks, which are both topics covered by the current application.
Das et al (US 11174289 B1) is considered relevant art as Das et al mentions information about variational autoencoders (VAEs), VAEs related to latent space, as well as examples of VAEs being used in the field of biology.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER D DEVORE whose telephone number is (703)756-1234. The examiner can normally be reached Monday-Friday 7:30 am - 5 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/C.D.D./Examiner, Art Unit 2129                                                                                                                                                                                                        

/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jun 28, 2021
Application Filed
Sep 16, 2024
Non-Final Rejection — §103
Dec 17, 2024
Response Filed
Feb 03, 2025
Final Rejection — §103
Mar 27, 2025
Applicant Interview (Telephonic)
Mar 27, 2025
Examiner Interview Summary
Mar 31, 2025
Response after Non-Final Action
Apr 17, 2025
Request for Continued Examination
May 07, 2025
Response after Non-Final Action
May 21, 2025
Non-Final Rejection — §103
Aug 27, 2025
Response Filed
Aug 28, 2025
Applicant Interview (Telephonic)
Aug 28, 2025
Examiner Interview Summary
Sep 29, 2025
Final Rejection — §103
Jan 05, 2026
Response after Non-Final Action
Feb 02, 2026
Request for Continued Examination
Feb 09, 2026
Response after Non-Final Action
Mar 17, 2026
Non-Final Rejection — §103
Apr 16, 2026
Applicant Interview (Telephonic)
Apr 16, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/332,099
Patent 12530603
OBTAINING AND UTILIZING FEEDBACK FOR AGENT-ASSIST SYSTEMS
2y 5m to grant Granted Jan 20, 2026
17/616,946
Patent 12505355
GENERAL FORM OF THE TREE ALTERNATING OPTIMIZATION (TAO) FOR LEARNING DECISION TREES
2y 5m to grant Granted Dec 23, 2025
17/454,551
Patent 12468978
Reinforcement Learning In A Processing Element Method And System Thereof
2y 5m to grant Granted Nov 11, 2025
17/508,715
Patent 12412069
COOKIE SPACE DOMAIN ADAPTATION FOR DEVICE ATTRIBUTE PREDICTION
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 4 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
50%
Grant Probability
92%
With Interview (+41.7%)
4y 1m
Median Time to Grant
High
PTA Risk
Based on 10 resolved cases by this examiner. Grant probability derived from career allow rate.
MECHANISTIC MODEL PARAMETER INFERENCE THROUGH ARTIFICIAL INTELLIGENCE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email