Last updated: April 19, 2026
Application No. 17/949,721
NEURAL GRAPHICAL MODELS

Final Rejection §101§102§103§112§DP
Filed
Sep 21, 2022
Examiner
BALAKRISHNAN, VIJAY MURALI
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +85.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 14 resolved cases, 2023–2026
Examiner Intelligence

BALAKRISHNAN, VIJAY MURALI View full profile →
Grants 43% of resolved cases
Career Allow Rate
6 granted / 14 resolved
-12.1% vs TC avg
Strong +86% interview lift
Without
With
+85.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 12m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.4%
-13.6% vs TC avg
§103
31.5%
-8.5% vs TC avg
§102
13.2%
-26.8% vs TC avg
§112
24.3%
-15.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§101 §102 §103 §112 §DP
DETAILED ACTION
	This final action is in response to the amendment and remarks filed on 10/16/2025 for application 17/949,721. 
	Claims 1, 3-4, 7, 9, 11, 16, and 20 have been amended. Claim 8 is cancelled. Claim 21 is newly added.
	Claims 1-7 and 9-21 are pending in the application. Claims 1, 11, and 16 are independent claims.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) filed 10/16/2025 and 1/21/2026 have been fully considered by the examiner.
Response to Amendment
	The amendment filed 10/16/2025 has been entered.
	Applicant’s amendment to the claims with respect to resolving claim objections and indefiniteness rejections under 35 U.S.C. 112(b) has been considered, and overcomes the objections and 112(b) rejections set forth in the office action mailed 07/16/2025. Consequently, the previous objections and 112(b) rejections are withdrawn. 	In light of the amendments made to both the instant application (17/949,721) and the co-pending application at issue (17/949,710), the provisional non-statutory double patenting rejection set forth in the office action mailed 07/16/2025 is withdrawn.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-7 and 9-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, it recites the limitation “fitting the regression to the input data”. There is insufficient antecedent basis for the term “the regression” in the claim, as the claim does not previously recite a regression. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
For purposes of examination, the limitation is interpreted as “fitting a regression to the input data”.
Regarding claim 6, it recites the limitation “wherein the loss function fits the neural network to the dependency structure along with fitting a regression of the input data”. However, these limitations are redundant, as they largely repeat limitations already present in a parent claim (see claim 1) and provide no change in scope. A proper dependent claim must specify a further limitation of the subject matter claimed (see MPEP § 608.01(n)) – as such, the apparent redundancies in language result in a lack of clarity.
Applicant is advised to either cancel or amend the claim such that it is in proper dependent form.
	Regarding claims 2-5, 7, and 9-10, they inherit the deficiencies of their parent claims. Consequently, they are also rejected under 35 U.S.C. 112(b) as being indefinite for depending on an indefinite parent claim. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7 and 9-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).
Independent Claims (Claim 1, Claim 11, Claim 16):
Step 1: Claim 1 is drawn to a method, claim 11 is drawn to a method, and claim 16 is drawn to a method. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 1, 11, and 16 each recite a judicially recognized exception of an abstract idea.
	Claim 1 recites, inter alia:
obtaining an input graph for a domain [based on input data]; – This limitation amounts to observing points of data and then representing said data in a graph, which is a process of evaluation capable of being performed in the human mind or using pen and paper.
identifying a dependency structure from the input graph, wherein the dependency structure is a matrix identifying connections among features in the input data that are directly correlated to one another and the features in the input data that are conditionally independent from one another – This limitation amounts to observing and modeling relationships between points of data related to a particular topic (i.e., domain), and representing said relationships via a matrix (e.g., a 2D matrix with 0s and 1s), and therefore recites a process of evaluation capable of being performed using pen and paper. It may also be reasonably interpreted as a procedure of modeling established mathematical relationships.
generating a view of a graphical model for the domain using the input data and the dependency structure – This limitation further amounts to observing and modeling relationships between points of data related to a particular topic (i.e., domain), and therefore recites a process of evaluation capable of being performed using pen and paper
Claim 11 recites, inter alia:
perform an inference task to provide an answer to a query, wherein the inference task uses an inference algorithm that answers the conditional query by predicting values of unknown nodes by iteratively updating values of unknown features until convergence – This limitation amounts to evaluating and responding to a query based on a generic algorithmic procedure of iteratively updating values of a graph to form predictions, and therefore recites a procedure capable of being performed using pen and paper.
Claim 16 recites, inter alia:
perform a sampling task that obtains sample data points and generates new samples jointly matching a distribution of original input data, wherein the sampling task uses a sampling algorithm that selects a feature at random based on the dependency structure and uses an inference algorithm in obtaining a value of a next feature until a sample value of all the features is obtained – This limitation amounts to performing a generic algorithmic procedure of sampling, observation, and prediction to produce additional data samples from existing samples, and therefore recites a process of evaluation capable of being performed using pen and paper.
Step 2A Prong 2: The following additional elements recited in claims 1, 11, and 16 do not integrate the recited judicial exceptions into a practical application.
	Claim 1 additionally recites:
[obtaining graph] based on input data generated from the domain – This limitation amounts to a mere data gathering step, and is therefore insignificant extra-solution activity.
[generating] neural view of neural graphical model for the domain by learning the neural view by fitting the neural view to the dependency structure and fitting the regression to the input data by modeling the neural view as a multi-task learning framework that minimizes a loss while maintaining the dependency structure – These terms are high-level, generic invocations of neural architecture that amount to no more than mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an existing mental process of observing and modeling relationships between data points). General invocation of learning distributions via optimizing a loss function of a “multi-task” learning framework, without significant steps of implementation, does no more than generally link the recited exception to the technological environment of neural architectures.
Claim 11 additionally recites:
receiving a conditional query for a domain; and outputting a set of values for the neural graphical model based on the inference task for the answer – These limitations amount to nominal pre-solution and post-solution steps of gathering and outputting data, and are therefore insignificant extra-solution activity.
accessing a neural view of a neural graphical model of the domain, wherein the neural view represents functions of features of the domain learned by fitting the neural view to a dependency structure of an input graph for the domain; using the neural graphical model to [perform an inference task]; [query] over the neural graphical model – These limitations are high-level, generic invocations of neural architecture that amount to no more than mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an existing mental process of evaluating and responding to a query based on reasoning). General invocation of learning distributions via a neural model, without significant steps of implementation, does no more than generally link the recited exception to the technological environment of neural architectures.
Claim 16 additionally recites:
accessing a neural view of a neural graphical model of a domain, wherein the neural view represents functions of features of the domain learned by fitting the neural view to a dependency structure of an input graph for the domain; using the neural graphical model to [perform]; [data points] from the neural graphical model; [feature at random] in the neural graphical model  – These limitations are high-level, generic invocations of neural architecture that amount to no more than mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an existing mental process of evaluating and responding to a query based on reasoning). General invocation of learning distributions via a neural model, without significant steps of implementation, does no more than generally link the recited exception to the technological environment of neural architectures.
and outputting a set of data samples generated by the neural graphical model based on the sampling task. – This limitation amounts to a nominal post-solution step of outputting data, and is therefore insignificant extra-solution activity.
Step 2B: The additional elements recited in claims 1, 11, and 16, viewed individually or as a combination, do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
	Claim 1 additionally recites:
[obtaining graph] based on input data generated from the domain – Receiving data over a network is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
[generating] neural view of neural graphical model for the domain by learning the neural view by fitting the neural view to the dependency structure and fitting the regression to the input data by modeling the neural view as a multi-task learning framework that minimizes a loss while maintaining the dependency structure – Utilizing multi-task learning frameworks for graph embedding models is well-understood, routine, and conventional activity (see Zhang et al., “A Survey on Multi-Task Learning”, cited in IDS filed 1/21/2026 [Abstract and page 5601 MTL with Other Learning Paradigms]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 11 additionally recites:
receiving a conditional query for a domain; and outputting a set of values for the neural graphical model based on the inference task for the answer – Receiving and transmitting data over a network is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
accessing a neural view of a neural graphical model of the domain, wherein the neural view represents functions of features of the domain learned by fitting the neural view to a dependency structure of an input graph for the domain; using the neural graphical model to [perform an inference task]; [query] over the neural graphical mode – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 16 additionally recites:
accessing a neural view of a neural graphical model of a domain, wherein the neural view represents functions of features of the domain learned by fitting the neural view to a dependency structure of an input graph for the domain; using the neural graphical model to [perform]; [data points] from the neural graphical model; [feature at random] in the neural graphical model – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
and outputting a set of data samples generated by the neural graphical model based on the sampling task. – Transmitting data over a network is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
As such, claims 1, 11, and 16 are not patent eligible.
Dependent Claims (Claims 2-7, Claims 9-10, Claims 12-15, Claims 17-21):
Dependent claims 2-7, 9-10, 12-15, and 17-21 narrow the scope of independent claims 1, 11, and 16, and thus merely narrow the recited judicial exceptions. With respect to the independent claims, the recited judicial exceptions are not meaningfully integrated into a practical application, and also do not amount to significantly more than the recited abstract ideas themselves. The dependent claims recite abstract idea limitations similar to those recited within the independent claims, as they also do not provide anything more than mathematical concepts or mental processes that are capable of being performed in the human mind and/or using pen and paper. The dependent claims also do not recite any further additional elements that successfully integrate the recited judicial exceptions into a practical application or amount to significantly more than the recited abstract ideas themselves. Consequently, claims 2-7, 9-10, 12-15, and 17-21 are also rejected under 35 U.S.C. 101.
Step 1: Claims 2-7 and 9-10 are drawn to a method, claims 12-15 are drawn to a method, and claims 17-21 are drawn to a method. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 2-10, 12-15, and 17-21 each recite a judicially recognized exception of an abstract idea.
	Claim 2 recites, inter alia:
[probabilistic graphical model] with functions that represent complex distributions over the domain – This limitation invokes mathematical functions as a means of representing probabilistic relationships between data (i.e., organizing and manipulating information through mathematical correlations), and therefore recites a mathematical relationship.
	Claim 3 recites the same judicial exception as claim 1.
Claim 4 recites, inter alia:
wherein functions for features of the domain are learned based on paths of the features of the domain through the one or more hidden layers of the neural network from the input layer to the output layer and the weights, wherein the weights specify functions between the features of the domain – This limitation recites learning functions (i.e., identifying mathematical relationships) via a series of mathematical transformations being performed on data (i.e., features of the domain), therefore amounting to organizing and manipulating information through mathematical correlations.
Claim 5 recites, inter alia:
optimizing the weights and the parameters of the neural network using a loss function; – This limitation amounts to performing an optimization procedure via mathematical calculations (i.e., optimization via a loss function).
and learning the functions using the weights and the parameters of the neural network – This limitation amounts to learning functions (i.e., mathematical relationships) involving parameters (i.e., variables) based on the organization and manipulation of data.
Claim 6 recites, inter alia:
wherein the loss function fits the neural network to the dependency structure along with fitting a regression of the input data – This limitation amounts to learning mathematical relationships between data (loss function fit[ting] the neural network to the dependency structure) and performing mathematical calculations (fitting a regression of the input data).
Claim 7 recites, inter alia:
updating the paths of the features of the domain through the one or more hidden layers of the neural network from the input to the output based on the functions learned – This limitation amounts to organizing and manipulating information (updating the paths of the features through the one or more hidden layers of the neural network from the input to the output) via mathematical correlations (based on the functions learned), and therefore recites a mathematical relationship.
Claim 9 recites, inter alia:
wherein the input graph is a directed input graph, an undirected input graph, or a mixed-edge input graph – This limitation amounts to merely creating a graph representation of input data (which is an existing mental process – [see Step 2A Prong 1 analysis of claim 1 on page 19]) wherein the graph contains edges (i.e., identifies connections between data points). Therefore, it recites a process of evaluation capable of being performed in the human mind or using pen and paper.
Claim 10 recites the same judicial exception as claim 1.
Claim 12 recites, inter alia:
wherein the set of output values is a set of fixed values or a set of distributions over values – This limitation amounts to making generic predictions about values and probabilities based on observing data, and therefore recites a process of evaluation capable of being performed in the human mind or using pen and paper.
Claim 13 recites, inter alia:
wherein the inference task predicts unknown values based on the neural graphical model – This limitation amounts to making generic predictions about values based on observing data, and therefore recites a process of evaluation capable of being performed in the human mind or using pen and paper.
Claim 14 recites the same judicial exception as claim 13.
Claim 15 recites, inter alia:
wherein the inference task uses a gradient-based approach to determine the unknown values in the set of values for the neural graphical model – This limitation amounts to performing mathematical functions (i.e., calculating gradients) to determine values, and therefore recites a mathematical calculation.
Claim  17 recites, inter alia:
randomly selecting a node in the neural graphical model as a starting node; placing remaining nodes in the neural graphical model in an order relative to the starting node; and creating a value for each node of the remaining nodes in the neural graphical model based on values from neighboring nodes to each node of the remaining nodes – These limitations amount to a procedure of organizing and manipulating information through mathematical correlations between nodes (i.e., data points) in a model, and therefore recite a mathematical relationship.
Claim 18 recites, inter alia:
adding random noise to the value created for the node based on a distribution conditioned on the values from the neighboring nodes – These limitations amount to performing a mathematical calculation (adding random noise to the value created for the node) based on mathematical correlations (a distribution conditioned on the values from the neighboring nodes).
Claim 19 recites, inter alia:
wherein the value created for each node is from a same distribution of input data over the domain – This limitation amounts to calculating values (wherein the value is created for each node) based on mathematical correlations (from a same distribution of input data over the domain).
Claim 20 recites the same judicial exception as claim 16.
Claim 21 recites the same judicial exception as claim 11.
Step 2A Prong 2: The following additional elements recited in claims 2-5, 10, 14, and 20-21 do not integrate the recited judicial exceptions into a practical application.
	Claim 2 additionally recites:
wherein the neural graphical model is a probabilistic graphical model –  This limitation is a high-level, generic invocation of probabilistic neural architecture that amounts to no more than mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an existing abstract idea).
Claim 3 additionally recites:
wherein the neural view includes an input layer with features of the domain, one or more hidden layers of a neural network, weights, and an output layer with the features of the domain after being processed by the neural network – These limitations are insignificant, high-level implementation steps that merely recite components of a neural architecture, wherein the neural architecture is merely being invoked as a tool to perform an existing abstract idea. Consequently, they recite insignificant extra-solution activity.
Claim 4 additionally recites:
training the neural view of the neural graphical model using the input data; [functions are learned] during the training of the neural view – These limitations are high-level, generic invocations of neural architecture that amount to mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an abstract idea).
Claim 5 additionally recites:
initializing the weights and parameters of the neural network for the neural view – These limitation is merely an insignificant, high-level implementation step with respect to components of a neural architecture, wherein the neural architecture is merely being invoked as a tool to perform an existing abstract idea [see Step 2A Prong 2 analysis of parent claim 1 on page 20]. Consequently, it recites insignificant extra-solution activity.
Claim 10 additionally recites:
providing the neural view of the neural graphical model as output on a display – This limitation amounts to a nominal post-solution step of merely outputting/displaying data, and is not meaningfully integrated into the claim as a whole. Therefore, it recites insignificant extra-solution activity.
Claim 14 additionally recites:
wherein the inference task uses message passing to determine the unknown values in the set of values for the neural graphical model – This limitation amounts to a high-level, generic invocation of message passing that amounts to no more than mere instructions to apply an exception (i.e., merely invokes message passing as a tool to perform an existing mental process of making predictions).
Claim 20 additionally recites:
wherein the neural view includes a trained neural network with an input layer with features from input data, one or more hidden layers of the neural network, optimized weights, an output layer with the features of the input data, and functions of the features of the input data after being processed by the neural network– These limitations are insignificant, high-level implementation steps that merely recite components of a neural architecture, wherein the neural architecture is merely being invoked as a tool to perform an existing abstract idea [see Step 2A Prong 2 analysis of parent claim 16 on page 21]. Consequently, they recite insignificant extra-solution activity.
Claim 21 additionally recites:
wherein the neural view allows the inference task to move forwards and backwards through the neural network in providing an answer to the conditional query – This limitation amounts to high-level, generic invocation of neural architecture as no more than mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an existing mental process of observing and modeling relationships between data points). General invocation of learning distributions via forwards and backwards propagation (move forwards and backwards through the neural network), which is typical to operations of a neural network, without further significant steps of implementation, does no more than generally link the recited exception to the technological environment of neural architectures.
Step 2B: The additional elements recited in claims 2-5, 10, 14, and 20-21, viewed individually or as a combination, do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
	Claim 2 additionally recites:
wherein the neural graphical model is a probabilistic graphical model –  Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 3 additionally recites:
wherein the neural view includes an input layer with features of the domain, one or more hidden layers of a neural network, weights, and an output layer with the features of the domain after being processed by the neural network – Using a neural architecture (i.e., a model the comprises the elements of the claimed architecture – input layer hidden layers, neural network, weights, output layer) to perform inference (e.g., a variational autoencoder (VAE)) is well understood, routine, and conventional activity (see Johnson, "Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", [page 3 Variational autoencoders]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 4 additionally recites:
training the neural view of the neural graphical model using the input data; [functions are learned] during the training of the neural view – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 5 additionally recites:
initializing the weights and parameters of the neural network for the neural view – Using a neural architecture (i.e., a model the comprises the elements of the claimed architecture – initializ[ed] weights and parameters) to perform inference (e.g., a variational autoencoder (VAE)) is well understood, routine, and conventional activity (see Johnson, "Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", [page 3 Variational autoencoders]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 10 additionally recites:
providing the neural view of the neural graphical model as output on a display – Connecting a generic display device to output information in a computer system is well understood, routine, and conventional activity [see instant specification ¶ 0110] and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 14 additionally recites:
wherein the inference task uses message passing to determine the unknown values in the set of values for the neural graphical model – Merely invoking message passing as a tool to perform an existing mental process of making predictions does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 20 additionally recites:
wherein the neural view includes a trained neural network with an input layer with features from input data, one or more hidden layers of the neural network, optimized weights, an output layer with the features of the input data, and functions of the features of the input data after being processed by the neural network – Using a neural architecture (i.e., a model that comprises the elements of the claimed architecture – input layer, hidden layers, neural network, optimized weights, output layer, functions) to perform inference (e.g., a variational autoencoder (VAE)) is well understood, routine, and conventional activity (see Johnson, "Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", [page 3 Variational autoencoders]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 21 additionally recites:
wherein the neural view allows the inference task to move forwards and backwards through the neural network in providing an answer to the conditional query – Using a neural architecture (i.e., an architecture that implements forwards and backwards propagation) to perform inference is well-understood, routine, and conventional activity see Johnson, "Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", [page 3 Variational autoencoders]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
As such, claims 2-7, 9-10, 12-15, and 17-21 are not patent eligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-9, 11-17, and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Johnson ("Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", available arXiv 20 March 2016).
Regarding claim 1, Johnson teaches A method ("Our approach uses graphical models for representing structured probability distributions, and uses ideas from variational autoencoders (Kingma & Welling, 2014) for learning not only the nonlinear feature manifold but also bottom-up recognition networks to improve inference. Thus our method enables the combination of flexible deep learning feature models with structured Bayesian and even Bayesian nonparametric priors. Our approach yields a single variational inference objective in which all components of the model are learned simultaneously. Furthermore, we develop a scalable fitting algorithm that combines several advances in efficient inference, including stochastic variational inference (Hoffman et al., 2013), graphical model message passing (Koller & Friedman, 2009), and backpropagation with the reparameterization trick (Kingma & Welling, 2014)…We refer to our general approach as the structured variational autoencoder (SVAE). In this paper we illustrate the SVAE using graphical models based on switching linear dynamical systems (SLDS) (Murphy, 2012; Fox et al., 2011)" [Johnson page 2 Introduction]), comprising: 
obtaining an input graph for a domain based on input data generated from the domain; (“In this section we apply the SVAE to both synthetic and real data and demonstrate its ability to learn both rich feature representations and simple latent dynamics. First, we apply a linear dynamical system (LDS) SVAE to synthetic data and illustrate some aspects of its training dynamics…. As a synthetic data example, consider a sequence of 1D images representing a dot bouncing from one side of the image to the other, as shown in the top panel of Figure 8a” [Johnson page 10 Experiments]; “Furthermore, the functions µ(yn; φ) and Σ(yn; φ) act as a recognition or inference network for observation yn, outputting a distribution over the latent variable xn using a feed-forward neural network applied to yn" [Johnson page 3 Variational autoencoders]; "The SVAE construction is general: it can admit many latent probabilistic models as well as many flexible observation models and recognition network designs. In this section we outline some of these possibilities. While the SVAE recognition network described in Section 3.2 only produces node potentials depending on single data points, in general SVAE recognition networks can output potentials on more than one node or take as input more than one data point. For example, a recognition network could output node potentials of the form φ(xt; yt, yt−1) that depend also on the previous data point, as sketched in Figure 6a, or even depend on many data points through a recurrent neural network (RNN). Recognition networks may also output factors on discrete latent variables, as sketched in Figure 6b" [Johnson page 6 Variations and extensions]; see Figure 6(a) and 6(b) [Johnson page 6]; The SVAE model can receive both synthetic (e.g., images (i.e., generated input data) of a moving dot (i.e., from a domain)) and real observations. The resulting recognition network (i.e., input graph) in SVAE is based on observations (i.e., input data) via a distribution over latent variables)
identifying a dependency structure from the input graph; ([Johnson page 10 Experiments] and [Johnson page 3 Variational autoencoders] and  [Johnson page 6 Variations and extensions] and Figure 6(a) and 6(b) [Johnson page 6] as detailed above; The recognition network (i.e., input graph) in SVAE is based on observations (i.e., input data) via a distribution over latent variables, and also models dependencies between variables (i.e., models dependency structure) via directed edges (see directed edges in 6(a) and 6(b)) between nodes (i.e., is a directed input graph). In light of the instant specification [¶ 0034-0036], the “input graph” can be reasonably interpreted as a graph created based on the input data, and also as being a model of “dependency structure” of the input data, and a “directed input graph” can be reasonably interpreted as a graph with directed edges), wherein the dependency structure is a matrix identifying connections among features in the input data that are directly correlated to one another and the features in the input data that are conditionally independent from one another (“Specifically, we choose each node potential to be a Gaussian factor in which the precision matrix J(yt; _) and potential vector h(yt; _) depend on the corresponding observation yt through an MLP, 
    PNG
    media_image1.png
    38
    391
    media_image1.png
    Greyscale
 using the notation from Section 2.2. These local recognition networks allow us to fit a regression from each observation yt to a probabilistic guess at the corresponding latent state xt” [Johnson page 6 Variational family and CRF recognition networks]; By definition , precision (i.e., inverse covariance – see Souly et al., “Scene Labeling Using Sparse Precision Matrix” [page 3652 Graphical Lasso and Sparse Precision Matrix]) matrices (i.e., dependency structures) drawn from the recognition networks capture the presence or absence of dependency between features) and 
generating a neural view of a neural graphical model for the domain by learning the neural view using the input data using the dependency structure ([Johnson page 3 Variational autoencoders] and [Johnson page 6 Variations and extensions] and see Figure 6(a) and 6(b) [Johnson page 6], as detailed above; The recognition network in SVAE represents the probability distribution over the latent variables and can also model dependencies between variables (i.e., be a dependency structure). SVAE can also be illustrated via graphical models (i.e., views of a neural model) based on an SLDS framework (e.g., see Figure 6(a) and 6(b) [Johnson page 6]). In light of the instant specification, a “neural view” can be reasonably interpreted as merely a representation of the model and its associated functions, thereby functionally equivalent to the model itself) by fitting the neural view to the dependency structure and fitting the regression to the input data by modeling the neural view as a multi-task learning framework that minimizes a loss while maintaining the dependency structure (“Our approach yields a single variational inference objective in which all components of the model are learned simultaneously. Furthermore, we develop a scalable fitting algorithm that combines several advances in efficient inference, including stochastic variational inference (Hoffman et al., 2013), graphical model message passing (Koller & Friedman, 2009), and backpropagation with the reparameterization trick (Kingma & Welling, 2014). Thus our algorithm can leverage conjugate exponential family structure where it exists to efficiently compute natural gradients with respect to some variational parameters, enabling effective second-order optimization (Martens, 2015), while using backpropagation to compute gradients with respect to all other parameters” [Johnson page 2 Introduction]; “Using 
    PNG
    media_image2.png
    43
    47
    media_image2.png
    Greyscale
 to denote the variational parameters for the factor 
    PNG
    media_image3.png
    35
    61
    media_image3.png
    Greyscale
, the variational parameters are then the encoder parameters 
    PNG
    media_image4.png
    31
    26
    media_image4.png
    Greyscale
 and the decoder parameters 
    PNG
    media_image5.png
    43
    42
    media_image5.png
    Greyscale
 , and the objective is 
    PNG
    media_image6.png
    118
    702
    media_image6.png
    Greyscale
” [Johnson page 3 Variational autoencoders]; “However, to enable flexible modeling of images and other complex features, we allow the dependence to be a more general nonlinear model. In particular, we consider 
    PNG
    media_image7.png
    32
    276
    media_image7.png
    Greyscale
 to be MLPs with parameters 
    PNG
    media_image8.png
    27
    22
    media_image8.png
    Greyscale
 as in Eqs. (10)-(13) of Section 2.2,  in Eqs. (10)-(13) of Section 2.2, 
    PNG
    media_image9.png
    60
    451
    media_image9.png
    Greyscale
[Johnson page 4 A switching linear dynamical system with nonlinear observations]; “The mean field objective in terms of the global variational parameters 
    PNG
    media_image10.png
    53
    169
    media_image10.png
    Greyscale
 is then 
    PNG
    media_image11.png
    90
    702
    media_image11.png
    Greyscale
” [Johnson page 6 Variational family and CRF recognition networks]; The disclosed variational objective [equation 30] combines training on both reconstructive (i.e., regressive) and structural loss parameters (framework jointly optimizes for both data representation and graph structure))
Regarding claim 11, Johnson teaches A method  ([Johnson page 2 Introduction], as detailed above in claim 1; “Thus the variational distribution q(xn | yn) acts like a stochastic encoder from an observation to a distribution over latent variables, while the forward model p(yn | xn) acts as a stochastic decoder from a latent variable value to a distribution over observations. Furthermore, the functions µ(yn; φ) and Σ(yn; φ) act as a recognition or inference network for observation yn, outputting a distribution over the latent variable xn using a feed-forward neural network applied to yn. See Figure 3 for graphical models of the VAE” [Johnson page 3 Variational autoencoders]), comprising: 
receiving a conditional query for a domain; (“Here we show how to compute stochastic gradients of the SVAE mean field objective using the results of a model inference subroutine. The algorithm is summarized in Algorithm 1…we compute gradients with respect to the recognition network parameters φ. Both the first and second terms of (31) depend on φ, the first term through the sample xˆ (n) (φ) and the second term through the KL divergence KL(φ) , KL(q(θ, z(n) , x(n) ) k p(θ, z(n) , x(n) )). Thus we must differentiate through the procedures that the model inference subroutine uses to compute these quantities. As [Algorithm 1] described in Section 4.2, performing this differentiation efficiently for the SLDS corresponds to backpropagation through message passing” [Johnson pages 7-8 SVAE algorithm]; see Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8]; The algorithm receives “first term through the sample xˆ (n) (φ)” as a query. In light of the instant specification (“One example query is a conditional query. The inference task 40 is given a Xi ( value of a node one of feature 34) of the neural graphical model 16 and predicts the most likely values of the other nodes (features) in the neural graphical model 16 [¶ 0054]), the gradient computation algorithm in SVAE (Algorithm 1) can be interpreted as leveraging the model (and associated “view”, as detailed in claim 1 above) to perform an inference task to provide an answer to a conditional query)
accessing a neural view of a neural graphical model of the domain, wherein the neural view represents functions of features of the domain learned by fitting the neural view to a dependency structure of an input graph for the domain; ([Johnson pages 7-8 SVAE algorithm] and Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8] as detailed above; 
    PNG
    media_image12.png
    81
    561
    media_image12.png
    Greyscale
 in Algorithm 1 [Johnson page 8]; The algorithm accesses the recognition network (whose parameters, as detailed in claim 1 above, are optimized (i.e., fitted) via variational training objective to represent the probability distribution over the latent variables and model dependencies between variables))
using the neural graphical model to perform an inference task to provide an answer to the query (“Here we describe a structured mean field family with which we can perform variational inference in the posterior distribution of the generative model from Section 3.1. This mean field family illustrates how an SVAE can leverage not only graphical model and exponential family structure but also learn bottom-up inference networks. As we show in Section 4, these structures allow us to compose several efficient inference algorithms including SVI, message passing, backpropagation, and the reparameterization trick [Johnson page 5 Variational family and CRF recognition networks]; “Here we show how to compute stochastic gradients of the SVAE mean field objective using the results of a model inference subroutine. The algorithm is summarized in Algorithm 1…we compute gradients with respect to the recognition network parameters φ. Both the first and second terms of (31) depend on φ, the first term through the sample xˆ (n) (φ) and the second term through the KL divergence KL(φ) , KL(q(θ, z(n) , x(n) ) k p(θ, z(n) , x(n) )). Thus we must differentiate through the procedures that the model inference subroutine uses to compute these quantities. As [Algorithm 1] described in Section 4.2, performing this differentiation efficiently for the SLDS corresponds to backpropagation through message passing” [Johnson pages 7-8 SVAE algorithm]; see Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8];. The algorithm receives “first term through the sample xˆ (n) (φ)” as a query, and accesses the recognition network (trained on observations yn to output latent distribution xn – see [Johnson page 3 Variational autoencoders] detailed above) to output a set of gradients (i.e., values) with respect to network parameters in response to the query) wherein the inference task uses an inference algorithm that answers the conditional query over the neural graphical model by predicting values of unknown nodes by iteratively updating values of unknown features until convergence (see 
    PNG
    media_image13.png
    46
    598
    media_image13.png
    Greyscale
 in Algorithm 1 Computing gradients of the SVAE objective and 
    PNG
    media_image14.png
    222
    675
    media_image14.png
    Greyscale
 in Algorithm 2 Model inference subroutine for the SLDS [Johnson page 8]); and 
outputting a set of values for the neural graphical model based on the inference task for the answer (see 
    PNG
    media_image15.png
    33
    601
    media_image15.png
    Greyscale
 in Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8]).
Regarding claim 16, Johnson teaches A method ([Johnson page 2 Introduction] and [Johnson page 3 Variational autoencoders], as detailed above in claim 11) comprising:
accessing a neural view of a neural graphical model of a domain, wherein the neural view represents functions of features of the domain learned by fitting the neural view to a dependency structure of an input graph for the domain; (“Here we describe the model inference subroutine used by  the SVAE algorithm….Specifically, using the notation of Section 2.2, in the special case of the VAE the inference subroutine computes KL(q(x | y) k p(x)) in closed form and generates samples x ∼ q(x | y) used to approximate Eq(x) ln p(y | x)… Specifically, the inference subroutine must first optimize the local mean field factors q(z1:T ) and q(x1:T ), then compute and return a sample xˆ1:T ∼ q(x1:T )… The algorithm is summarized in Algorithm 2” [Johnson page 8 Model inference subroutine]; see Algorithm 2 Model inference subroutine for the SLDS [Johnson page 8]; 
    PNG
    media_image16.png
    62
    557
    media_image16.png
    Greyscale
in Algorithm 2 [Johnson page 8]; The recognition network (which, as detailed in claim 1 above, is optimized (i.e., fitted) via variational training objective to represent the probability distribution over the latent variables and model dependencies between variables)) is accessed as input and used in the model inference subroutine)	
using the neural graphical model to perform a sampling task (“Here we describe the model inference subroutine used by  the SVAE algorithm….Specifically, using the notation of Section 2.2, in the special case of the VAE the inference subroutine computes KL(q(x | y) k p(x)) in closed form and generates samples x ∼ q(x | y) used to approximate Eq(x) ln p(y | x)… Specifically, the inference subroutine must first optimize the local mean field factors q(z1:T ) and q(x1:T ), then compute and return a sample xˆ1:T ∼ q(x1:T )… The algorithm is summarized in Algorithm 2” [Johnson page 8 Model inference subroutine]; see Algorithm 2 Model inference subroutine for the SLDS [Johnson page 8]; The recognition network is accessed as input and used in the model inference subroutine to output samples (see “return sample xˆ” in line 13 of Algorithm 2)) that obtains sample data points from the neural graphical model and generates new samples jointly matching a distribution of original input data , wherein the sampling task uses a sampling algorithm that selects a feature at random in the neural graphical model based on the dependency structure (see 
    PNG
    media_image17.png
    188
    647
    media_image17.png
    Greyscale
 in Algorithm 2 [Johnson page 8]) and uses an inference algorithm in obtaining a value of a next feature until a sample value of all the features is obtained (see  
    PNG
    media_image18.png
    217
    713
    media_image18.png
    Greyscale
 in Algorithm 2 [Johnson page 8]); and 
outputting a set of samples generated by the neural graphical model based on the sampling task (see 
    PNG
    media_image19.png
    33
    638
    media_image19.png
    Greyscale
 in Algorithm 2 [Johnson page 8])
Regarding claim 2, Johnson teaches the limitations of parent claim 1 and wherein the neural graphical model is a probabilistic graphical model (“In this section we develop an SVAE generative model and corresponding variational family. To be concrete we focus on a particular generative model for time series based on a switching linear dynamical system (SLDS) (Murphy, 2012; Fox et al., 2011), which illustrates how the SVAE can incorporate both discrete and continuous latent variables with rich probabilistic dependence” [Johnson page 4 Generative model and variational family]; The SVAE model is based on an SLDS framework, wherein SLDS is a type of probabilistic graphical model) with functions that represent complex distributions over the domain ("The SVAE construction is general: it can admit many latent probabilistic models as well as many flexible observation models and recognition network designs. In this section we outline some of these possibilities. While the SVAE recognition network described in Section 3.2 only produces node potentials depending on single data points, in general SVAE recognition networks can output potentials on more than one node or take as input more than one data point. For example, a recognition network could output node potentials of the form φ(xt; yt, yt−1) that depend also on the previous data point, as sketched in Figure 6a, or even depend on many data points through a recurrent neural network (RNN). Recognition networks may also output factors on discrete latent variables, as sketched in Figure 6b" [Johnson page 6 Variations and extensions]; SVAE (and its associated functions) can model dependencies between variables, including data points that “even depend on many [other] data points” (i.e., complex distributions)).
Regarding claim 3, Johnson teaches the limitations of parent claim 1 and wherein the neural view includes an input layer with features of the domain, one or more hidden layers of a neural network, weights, and an output layer with the features of the domain after being processed by the neural network  (“That is, using the fact that the optimal factor q(x1:T ) is Markov according to a chain graph, we write it terms of pairwise potentials and node potentials as [equation 27] where the node potential ψ(xt; yt, φ) is a function of the observation yt. Specifically, we choose each node potential to be a Gaussian factor in which the precision matrix J(yt; φ) and potential vector h(yt; φ) depend on the corresponding observation yt through an MLP, [equation 28] using the notation from Section 2.2” [Johnson pages 5-6 Variational family and CRF recognition networks]; “2.2 Variational autoencoders…Given a high-dimensional dataset y = {yn} N n=1 such as a collection of images, the VAE models each observation yn in terms of a low-dimensional latent variable xn and a nonlinear observation model with parameters ϑ, [equations 8-9] where µ(xn; ϑ) and Σ(xn; ϑ) might depend on xn through a multilayer perceptron (MLP) with L layers [equations 10-13]…Thus the variational distribution q(xn | yn) acts like a stochastic encoder from an observation to a distribution over latent variables,” [Johnson page 3 Variational autoencoders]; The SVAE model can take a set of observations and use a multi-layer perceptron (MLP) (i.e., type of neural network) comprising L layers (i.e., implicitly comprising input layer, hidden layer(s), and output layer) and its associated functions (including activation functions taking weights/biases of MLP neurons as input – see equation 10 function h`(xn) = f(W`h`l-1(xn) + b`) to generate a latent variable representation)
Regarding claim 4, Johnson teaches the limitations of parent claim 3 and training the neural view of the neural graphical model using the input data, wherein functions for features of the domain are learned during the training of the neural view based on paths of the features of the domain through the one or more hidden layers of the neural network from the input layer to the output layer and the weights, wherein the weights specify functions between the features of the domain ([Johnson pages 5-6 Variational family and CRF recognition networks] and [Johnson page 3 Variational autoencoders], as detailed above in claim 3; “We trained the LDS SVAE on 80 random image sequences each of length 50, using one sequence per update, and show  the model’s future predictions given a prefix of a longer sequence. We used MLP image and recognition models each with one hidden layer of 50 units” [Johnson page 11 Image of a moving dot]; see Figure 8(a) and 8(b) on [Johnson page 12]; "(a) Predictions after 200 training epochs…Predictions after 1100 training epochs" [Johnson page 12]; The SVAE model implicitly trains and updates parameters across iterations/epochs. By nature, a neural network (e.g., MLP) inherently performs a sequence of mathematical transformations on input data (i.e., features of the domain) via passing though the one or more hidden layers of the neural network (i.e., paths of the features) to the output layer, thereby learning functions for said features via the transformations)
Regarding claim 5, Johnson teaches the limitations of parent claim 4 and wherein training the neural view of the neural graphical model ([Johnson pages 5-6 Variational family and CRF recognition networks] and [Johnson page 3 Variational autoencoders], as detailed above in claim 3; [Johnson page 11 Image of a moving dot] and Figure 8(a) and 8(b) on [Johnson page 12], as detailed above in claim 4)) further comprises:  
initializing the weights and parameters of the neural network for the neural view; (“Furthermore, the functions µ(yn; φ) and Σ(yn; φ) act as a recognition or inference network for observation yn, outputting a distribution over the latent variable xn using a feed-forward neural network applied to yn" [Johnson page 3 Variational autoencoders]; "These local recognition networks allow us to fit a regression from each observation yt to a probabilistic guess at the corresponding latent state xt….There are many alternative designs for such recognition networks; for example, the node potential on xt need not depend only on the observation yt. See Section 3.3." [Johnson page 6 Variational family and CRF recognition networks]; Initializing weights and parameters, optimizing parameters via a loss function, and learning functions of input data via mathematical transformations (and associated weights/parameters) across layers are inherent functions to the training of a neural network (e.g., MLP). Further, in addition to fitting regression from observation to latent state, the recognition network also can be trained (i.e., use a loss function) to model dependencies between observations (i.e., approximate dependency structure of the input data)
optimizing the weights and the parameters of the neural network using a loss function; ([Johnson page 3 Variational autoencoders] and [Johnson page 6 Variational family and CRF recognition networks] as detailed above; Optimizing parameters via a loss function is inherent to the training of a neural network (e.g., MLP) and 
learning the functions using the weights and the parameters of the neural network (([Johnson page 3 Variational autoencoders] and [Johnson page 6 Variational family and CRF recognition networks] as detailed above; Learning functions of input data via mathematical transformations (and associated weights/parameters) across layers is inherent to the training of a neural network (e.g., MLP))
Regarding claim 6, Johnson teaches the limitations of parent claim 5 and wherein the loss function fits the neural network to the dependency structure along with fitting a regression of the input data ([Johnson page 6 Variational family and CRF recognition networks] as detailed above in claim 5; In addition to fitting regression from observation to latent state, the recognition network also can be trained (i.e., use a loss function) to model dependencies between observations (i.e., approximate dependency structure of the input data)
Regarding claim 7, Johnson teaches the limitations of parent claim 5 and updating the paths of the features of the domain through the one or more hidden layers of the neural network from the input to the output based on the functions learned ([Johnson pages 5-6 Variational family and CRF recognition networks] and [Johnson page 3 Variational autoencoders], as detailed above in claim 3; [Johnson page 11 Image of a moving dot] and see Figure 8(a) and 8(b) on [Johnson page 12] as detailed above in claim 4; The SVAE model implicitly trains and updates parameters across iterations/epochs)
Regarding claim 9, Johnson teaches the limitations of parent claim 1 and wherein the input graph is a directed input graph, an undirected input graph, or a mixed-edge input graph (see Figure 6(a) and 6(b) [Johnson page 6]; The recognition network (i.e., input graph) in SVAE is based on observations (i.e., input data) via a distribution over latent variables, and also models dependencies between variables (i.e., models dependency structure) via directed edges (see directed edges in 6(a) and 6(b)) between nodes (i.e., is a directed input graph). In light of the instant specification [¶ 0034-0036], a “directed input graph” can be reasonably interpreted as an input graph with directed edges).
Regarding claim 12, Johnson teaches the limitations of parent claim 11 and wherein the set of output values is a set of fixed values or a set of distributions over values (”The algorithm is summarized in Algorithm 1…we compute gradients with respect to the recognition network parameters φ” [Johnson page 7 SVAE algorithm]; see Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8]; The algorithm outputs gradients of recognition network parameters, which are fixed inputs – see “recognition network parameters φ” as Input in Algorithm 1)
Regarding claim 13, Johnson teaches the limitations of parent claim 11 and wherein the inference task predicts unknown values based on the neural graphical model ([Johnson page 5 Variational family and CRF recognition networks] and [Johnson pages 7-8 SVAE algorithm] and see Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8] as detailed above in claim 11; An inference task inherently estimates, or predicts, unknown values (e.g., gradients))
Regarding claim 14, Johnson teaches the limitations of parent claim 13 and wherein the inference task uses message passing to determine the unknown values in the set of values for the neural graphical model (“As [Algorithm 1] described in Section 4.2, performing this differentiation efficiently for the SLDS corresponds to backpropagation through message passing” [Johnson pages 7-8 SVAE algorithm])
Regarding claim 15, Johnson teaches the limitations of parent claim 13 and wherein the inference task uses a gradient-based approach to determine the unknown values in the set of values for the neural graphical model ([Johnson page 5 Variational family and CRF recognition networks] and [Johnson pages 7-8 SVAE algorithm] and see Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8] as detailed above in claim 11)
Regarding claim 17, Johnson teaches the limitations of parent claim 16 and wherein the sampling task further comprises: 
randomly selecting a node in the neural graphical model as a starting node; (“Message passing can also be used to draw samples or compute the log partition function efficiently, as we describe in Section 4.2.2" [Johnson page 9 Optimizing Local Mean Field Factors]; After optimizing the local variational factors, the model inference subroutine uses the optimized factors to draw samples, compute expected sufficient statistics, and compute a KL divergence. The results of these inference computations, which we detail here, are then used to compute gradients of the SVAE objective as described in Section 4.1. To draw S samples {xˆ (s)} S s=1 where xˆ (s) 1:T iid∼ q(x1:T ), we can perform message passing in q(x1:T ), which is a Gaussian distribution that is Markov with respect to a chain graph. Given two neighboring nodes i and j we define the message from i to j as [equation 38] where k is the other neighbor of node i. We can pass messages backward once, running a Kalman filter, and then sample forward S times. That is, after computing the messages mt+1→t(xt) for t = T − 1, . . . , 1, we compute the marginal distribution on x1 as [equation 39] and sample xˆ1 ∼ q(x1). Iterating, given a sample of xˆt−1, we sample xˆt from the conditional distributions [equation 40] thus constructing a joint sample xˆ1:T ∼ q(x1:T )” [Johnson page 9 Samples, Expected Statistics, and KL]; see Figure 13(b) Generated random samples from a VAE fit to mouse depth video on [Johnson page 15]; Samples can be drawn via a message passing procedure of randomly selecting a starting node i and determining values for remaining nodes in joint sample x^1:T via iterating through neighboring nodes in an order relative to starting node i)
placing remaining nodes in the neural graphical model in an order relative to the starting node; ([Johnson page 9 Optimizing Local Mean Field Factors] and [Johnson page 9 Samples, Expected Statistics, and KL] and Figure 13(b) Generated random samples from a VAE fit to mouse depth video on [Johnson page 15] as detailed above; Typical to message passing procedure, after selecting starting node i, values for remaining nodes in joint sample x^1:T are determined via iterating through neighboring nodes in an order relative to starting node i) and 
creating a value for each node of the remaining nodes in the neural graphical model based on values from neighboring nodes to each node of the remaining nodes ([Johnson page 9 Optimizing Local Mean Field Factors] and [Johnson page 9 Samples, Expected Statistics, and KL] and Figure 13(b) Generated random samples from a VAE fit to mouse depth video on [Johnson page 15] as detailed above; Typical to message passing procedure, after selecting starting node i, values for remaining nodes in joint sample x^1:T are determined via iterating through neighboring nodes in an order relative to starting node i)
Regarding claim 19, Johnson teaches the limitations of parent claim 17 and wherein the value created for each node is from a same distribution of input data over the domain (“Iterating, given a sample of xˆt−1, we sample xˆt from the conditional distributions [equation 40] thus constructing a joint sample xˆ1:T ∼ q(x1:T )” [Johnson page 9 Samples, Expected Statistics, and KL])
Regarding claim 20, Johnson teaches the limitations of parent claim 16 and wherein the neural view includes a trained neural network with an input layer with features from input data, one or more hidden layers of the neural network, optimized weights, an output layer with the features of the input data, and functions of the features of the input data after being processed by the neural network ([Johnson page 8 Model inference subroutine] and see Algorithm 2 Model inference subroutine for the SLDS [Johnson page 8] as detailed above in claim 16; “That is, using the fact that the optimal factor q(x1:T ) is Markov according to a chain graph, we write it terms of pairwise potentials and node potentials as [equation 27] where the node potential ψ(xt; yt, φ) is a function of the observation yt. Specifically, we choose each node potential to be a Gaussian factor in which the precision matrix J(yt; φ) and potential vector h(yt; φ) depend on the corresponding observation yt through an MLP, [equation 28] using the notation from Section 2.2. These local recognition networks allow us to fit a regression from each observation yt to a probabilistic guess at the corresponding latent state xt”” [Johnson pages 5-6 Variational family and CRF recognition networks]; The SVAE model can take a set of observations and use a multi-layer perceptron (MLP) (i.e., type of neural network) comprising L layers (i.e., implicitly comprising input layer, hidden layer(s), and output layer) and its associated functions (including activation functions taking weights/biases of MLP neurons as input – see equation 10 function h`(xn) = f(W`h`l-1(xn) + b`) to generate a latent variable representation. Further, the recognition network (trained on observations yn to output latent distribution xn – see [Johnson page 3 Variational autoencoders] detailed above) is accessed as input and used in the model inference subroutine (see “Input: Variational dynamics parameter ηeθ of q(θ), node potentials {ψ(xt; yt)} T t=1 from recognition network” in lines 1-2 of Algorithm 2) to output samples (see “return sample xˆ” in line 13 of Algorithm 2))
	Regarding claim 21, Johnson teaches the limitations of parent claim 11 and wherein the neural view allows the inference task to move forwards and backwards through the neural network in providing an answer to the conditional query (“As [Algorithm 1] described in Section 4.2, performing this differentiation efficiently for the SLDS corresponds to backpropagation through message passing” [Johnson pages 7-8 SVAE algorithm]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Johnson, as applied to claim 1 above, in view of Lin (Pub. No. US 20220222049 A1, “Visual Programming for Deep Learning”, filed 05/06/2020).
Regarding claim 10, Johnson teaches the limitations of parent claim 1 and providing the neural view of the neural graphical model, as detailed above in claim 1.
However, Johnson does not explicitly teach providing the neural view of the neural graphical model as output on a display.
In the same field of endeavor, Lin teaches a method of using graphical neural architecture to model relationships between data (“A computer-implemented method comprises presenting a visual representation of an artificial neural network, the visual representation comprising graphical elements representing layers of the artificial neural network” [Lin ¶ 0002]; “The intermediate representation may correspond to the visual representation 300 and indicate the parameters and name for each graphical element... The intermediate calculation chart may describe the artificial neural network as a directed graph independent of the deep learning frameworks. There may not be a one-to-one correspondence between the nodes and the layers of the deep learning framework. For example, one node may correspond to multiple layers in the deep learning framework or multiple nodes may correspond to one layer in the deep learning framework” [Lin ¶ 0033-0034]) that provides a neural view of a neural graphical model as output on a display (“The computing device 100 may be used for implementing a visual programming scheme in accordance with implementations of the present disclosure. For example, the computing device 100 may display, on a monitor, a visual representation of an artificial neural network, such as various layers and connections between the layers” [Lin ¶ 0019]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated providing the neural view of the neural graphical model as output on a display, as taught by Lin, into Johnson because they are both directed towards using graphical neural architecture to model relationships between data. Given that graphical user interfaces are well-known in the art, displaying the generated neural view of the neural graphical model as output on a display would not only further contribute to interpretability of the model for a given user, but would also, as further taught by Lin [¶ 0019, ¶ 0022], potentially enable drag-and-drop manipulation of network elements at the user’s convenience.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Johnson, as applied to claim 17 above, in view of Rezende et al., ("Stochastic Backpropagation and Approximate Inference in Deep Generative Models", available arXiv 30 May 2014), hereinafter Rezende.
Regarding claim 18, Johnson teaches the limitations of parent claim 17.
However, Johnson does not explicitly teach wherein creating the value for each node further comprises: adding random noise to the value created for the node based on a distribution conditioned on values from the neighboring nodes.
In the same field of endeavor, Rezende teaches a method of modeling structural relationships between observed data in an embedding space (“These efforts, combined with the demand for accurate probabilistic inferences and fast simulation, lead us to seek generative models that are i) deep, since hierarchical architectures allow us to capture complex structure in the data, ii) allow for fast sampling of fantasy data from the inferred model, and iii) are computationally tractable and scalable to high-dimensional data” [Rezende page 1 Introduction]) that add[s] random noise to [a] value created for [a] node based on a distribution conditioned on values from the neighboring nodes. (“Deep latent Gaussian models (DLGMs) are a general  class of deep directed graphical models that consist of  Gaussian latent variables at each layer of a processing hierarchy. The model consists of L layers of latent variables. To generate a sample from the model, we begin at the top-most layer (L) by drawing from a Gaussian distribution. The activation hl at any lower layer is formed by a non-linear transformation of the layer above hl+1, perturbed by Gaussian noise. This generative process is described as follows: [see equations 1-4]" [Rezende page 2 Deep Latent Gaussian Models]; "We regularize the recognition model by introducing additional noise, specifically, bit-flip or drop-out noise at the input layer and small additional Gaussian noise to samples from the recognition model" [Rezende page 4 Free Energy Objective]; Gaussian noise (i.e., noise drawn from a Gaussian distribution of latent variables) is added to output from the recognition model when generating samples).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated adding random noise to the value created for the node based on a distribution conditioned on values from the neighboring nodes. as taught by Rezende into Johnson because they are both directed towards modeling structural relationships between observed data in an embedding space. Given regularization of deep learning models to prevent overfitting is a well-known concept in the art, incorporating the regularization techniques taught by Rezende would improve capabilities of the SVAE model to provide an accurate approximation of the posterior distribution, and thereby perform accurate inference for unseen data (“To allow for the best possible inference, the specification of the recognition model must be flexible enough to provide an accurate approximation of the posterior distribution…We regularize the recognition model… We found that such regularisation is essential and without it the recognition model is unable to provide accurate inferences for unseen data points”  [Rezende page 4 Free Energy Objective]).
Response to Arguments
The remarks filed 10/16/2025 have been fully considered.
	Applicant’s remarks [Remarks pages 9-11] traversing the non-eligible subject matter rejections under 35 U.S.C. 101 set forth in the office action mailed 07/16/2025, in view of claims 1-7 and 9-21 as amended, have been considered but are not persuasive.
	Applicant alleges that the amended independent claims recite more than high-level, generic invocations, and overall recite elements that both individually and a whole are directed to a practical application. Applicant cites to the specification [¶ 0030, 0031] to explain how the recited computational models result in improvements to model runtime.
	The examiner respectfully disagrees. As further explained in the rejection set forth above, the newly added limitations still recite high-level features that are either common to implementation of conventional neural architectures or merely expand on previously recited exceptions. Further, "claiming the improved speed or efficiency inherent with applying the abstract idea on a computer" does not integrate a judicial exception into a practical application or provide an inventive concept (see MPEP § 2106.05(f)). The new limitations still do not provide adequate detail to clearly set forth an improvement to conventional technology or a technical field (such as an improvement to the underlying model itself).
	Applicant has not presented further arguments with respect to the dependent claims. As such, amended claims 1-7 and 9-21 stand rejected under 35 U.S.C. 101.
	Applicant’s remarks [Remarks pages 11-13] traversing the anticipation rejections under 35 U.SC. 102 and obviousness rejections under 35 U.S.C. 103 set forth in the office action mailed 07/16/2025, in view of claims 1-7 and 9-21 as amended, have been considered but are not persuasive. Applicant is directed towards the 102 and 13 rejections set forth above, which explain how Johnson is interpreted to still teach the limitations at issue.	Applicant has not presented further arguments with respect to the dependent claims. As such, amended claims 1-7 and 9-21 stand rejected under 35 U.S.C. 102 and/or 35 U.S.C. 103.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Souly et al. (“Scene Labeling Using Sparse Precision Matrix”, published 2016) discloses a means of utilizing an estimation of a precision matrix (i.e., inverse covariance matrix) to capture dependency between variables and find graph structure.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY M BALAKRISHNAN whose telephone number is (571) 272-0455. The examiner can normally be reached 10am-5pm EST Mon-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.M.B./
Examiner, Art Unit 2143 
	

/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

Sep 21, 2022
Application Filed
Jul 10, 2025
Non-Final Rejection — §101, §102, §103
Sep 26, 2025
Interview Requested
Oct 14, 2025
Examiner Interview Summary
Oct 14, 2025
Applicant Interview (Telephonic)
Oct 16, 2025
Response Filed
Feb 08, 2026
Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/766,854
Patent 12585912
GATED LINEAR CONTEXTUAL BANDITS
2y 5m to grant Granted Mar 24, 2026
17/517,698
Patent 12468967
METHOD AND SYSTEM FOR GENERATING A SOCIO-TECHNICAL DECISION IN RESPONSE TO AN EVENT
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
43%
Grant Probability
99%
With Interview (+85.7%)
3y 12m
Median Time to Grant
Moderate
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allow rate.