DETAILED ACTION
This final action is in response to the amendment and remarks filed on 09/30/2025 for application 17/949,710.
Claims 1, 3, 8-12, 16-17, and 20 have been amended.
Claims 1-20 remain pending in the application. Claims 1, 12, and 17 are independent claims.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) filed 07/03/2025 and 08/14/2025 have been considered by the examiner.
Response to Amendment
The amendment filed 09/30/2025 has been entered.
Applicant’s amendment to the claims with respect to resolving claim objections and indefiniteness rejections under 35 U.S.C. 112(b) has been considered, and overcomes the objections and 112(b) rejections set forth in the office action mailed 07/02/2025. Consequently, the previous objections and 112(b) rejections are withdrawn.
In light of the amendments made to both the instant application (17/949,710) and the co-pending application at issue (17/949,721), the provisional non-statutory double patenting rejection set forth in the office action mailed 07/02/2025 is withdrawn.
Claim Objections
Claim 20 is objected to because of the following informality:
In claim 20, the limitation “a decoder that transforms the output embedding at the output layer to an input data” should read ““a decoder that transforms the output embedding at the output layer to an input data space” (as is consistent with corresponding claims 3 and 16) to improve clarity.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).
Independent Claims (Claim 1, Claim 12, Claim 17):
Step 1: Claim 1 is drawn to a method, claim 12 is drawn to a method, and claim 17 is drawn to a method. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 1, 12, and 17 each recite a judicially recognized exception of an abstract idea.
Claim 1 recites, inter alia:
identifying a dependency structure for the input data; and generating a view of a graphical model for the domain using the dependency structure – This limitation amounts to observing and modeling relationships between points of data related to a particular topic (i.e., domain), and therefore recites a process of evaluation capable of being performed in the human mind or using pen and paper (e.g., identifying a relationship between an image of a cat and a label text “cat”).
Claim 12 recites, inter alia:
perform an inference task to provide an answer to the conditional query – This limitation amounts to evaluating and responding to a query based on reasoning, and therefore recites a process of evaluation capable of being performed in the human mind or using pen and paper.
Claim 17 recites, inter alia:
perform a sampling task – This limitation amounts to performing a generic procedure to produce data samples, and therefore recites a process of evaluation capable of being performed in the human mind or using pen and paper.
Step 2A Prong 2: The following additional elements recited in claims 1, 12, and 17 do not integrate the recited judicial exceptions into a practical application.
Claim 1 additionally recites:
receiving input data generated from a domain – This limitation amounts to a mere data gathering step, and is therefore insignificant extra-solution activity.
wherein the input data includes a combination of different data types – This limitation amounts to merely specifying a data source or type of data to be manipulated, and is therefore insignificant extra-solution activity.
neural [view]; neural [graphical model], wherein the neural graphical model is a probabilistic graphical model that handles complex distributions over the domain and captures a distribution defined by the input data – These limitations do no more than merely invoke generic computational models (neural graphical models, probabilistic graphical model) as tools to “handle and process distributions”, i.e., perform an existing mental process of observing and modeling relationships between data points.
[capturing a distribution by] a projection module transforming the input data into a compressed vector representation that the neural graphical model uses in learning parameters of the distribution – This limitation does no more than recite a pre-processing step of transforming the data into an encoded representation to enable further analysis, and therefore recites insignificant extra-solution activity
using the compressed vector representation by a learning algorithm in training the neural view of the neural graphical model to learn functional dependencies jointly of the dependency structure by sharing parameters across tasks and jointly optimizing regression and structure loss in a multi-task learning framework – Wherein the claim invokes the neural graphical model as a mere tool to perform an existing process, this limitation does no more than further link the recited exception to the technological environment of neural architectures via general invocations of parameter optimizations via loss functions.
Claim 12 recites substantially similar additional elements to those recited in claim 1, and further recites:
receiving a conditional query for a domain; and outputting a set of values for the neural graphical model based on the inference task for the answer – These limitations amount to nominal pre-solution and post-solution steps of gathering and outputting data, and are therefore insignificant extra-solution activity.
accessing a neural view of a neural graphical model trained on input data; using the neural graphical model to [perform] – These limitations are high-level, generic invocations of neural architecture that amount to no more than mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an existing mental process of evaluating and responding to a query based on reasoning)
wherein the input data includes a combination of different data types of the input data; – This limitation amounts to merely specifying a data source or type of data to be manipulated, and is therefore insignificant extra-solution activity.
Claim 17 recites substantially similar additional elements to those recited in claim 1, and further recites:
accessing a neural view of a neural graphical model trained on input data for a domain; using the neural graphical model to [perform] – These limitations are high-level, generic invocations of neural architecture that amount to no more than mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an existing mental process of evaluating and responding to a query based on reasoning)
wherein the input data includes a combination of different data types of the input data – This limitation amounts to merely specifying a data source or type of data to be manipulated, and is therefore insignificant extra-solution activity.
and outputting a set of data samples generated by the neural graphical model based on the sampling task. – This limitation amounts to a nominal post-solution step of outputting data, and is therefore insignificant extra-solution activity.
Step 2B: The additional elements recited in claims 1, 12, and 17, viewed individually or as a combination, do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
Claim 1 additionally recites:
receiving input data generated from a domain – Receiving data over a network is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
wherein the input data includes a combination of different data types; – Combining data from different modalities to perform inference (i.e., multimodal data fusion) is well-understood, routine, and conventional activity (see Zhang et al., "Multimodal Intelligence: Representation Learning, Information Fusion, and Applications", [page 478 Introduction], [pages 481-483 Fusion]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
neural [view]; neural [graphical model]; wherein the neural graphical model is a probabilistic graphical model that handles complex distributions over the domain and captures a distribution defined by the input data – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
[capturing a distribution by] a projection module transforming the input data into a compressed vector representation that the neural graphical model uses in learning parameters of the distribution – Using encoders to project graph data into a latent vector space is well-understood, routine, and conventional activity (see Mahdavi et al., “Dynamic Joint Variational Graph Autoencoders” [pages 385-387 Introduction and Related work]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
using the compressed vector representation by a learning algorithm in training the neural view of the neural graphical model to learn functional dependencies jointly of the dependency structure by sharing parameters across tasks and jointly optimizing regression and structure loss in a multi-task learning framework – Utilizing multi-task learning frameworks for graph embedding models is well-understood, routine, and conventional activity (see Zhang et al., “A Survey on Multi-Task Learning” [Abstract and page 5601 MTL with Other Learning Paradigms]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 12 recites substantially similar additional elements to those recited in claim 1, and further recites:
receiving a conditional query for a domain; and outputting a set of values for the neural graphical model based on the inference task for the answer – Receiving and transmitting data over a network is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
accessing a neural view of a neural graphical model trained on input data; using the neural graphical model to [perform] – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
wherein the input data includes a combination of different data types of the input data; – Combining data from different modalities to perform inference (i.e., multimodal data fusion) is well understood, routine, and conventional activity (see Zhang et al., "Multimodal Intelligence: Representation Learning, Information Fusion, and Applications", [page 478 Introduction], [pages 481-483 Fusion]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 17 recites substantially similar additional elements to those recited in claim 1, and further recites:
accessing a neural view of a neural graphical model trained on input data for a domain; using the neural graphical model to [perform] – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
wherein the input data includes a combination of different data types of the input data – Combining data from different modalities to perform inference (i.e., multimodal data fusion) is well understood, routine, and conventional activity (see Zhang et al., "Multimodal Intelligence: Representation Learning, Information Fusion, and Applications", [page 478 Introduction], [pages 481-483 Fusion]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
and outputting a set of data samples generated by the neural graphical model based on the sampling task. – Transmitting data over a network is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
As such, claims 1, 12, and 17 are not patent eligible.
Dependent Claims (Claims 2-11, Claims 13-16, Claims 18-20):
Dependent claims 2-11, 13-16, and 18-20 narrow the scope of independent claims 1, 12, and 17, and thus merely narrow the recited judicial exceptions. With respect to the independent claims, the recited judicial exceptions are not meaningfully integrated into a practical application, and also do not amount to significantly more than the recited abstract ideas themselves. The dependent claims recite abstract idea limitations similar to those recited within the independent claims, as they also do not provide anything more than mathematical concepts or mental processes that are capable of being performed in the human mind and/or using pen and paper. The dependent claims also do not recite any further additional elements that successfully integrate the recited judicial exceptions into a practical application or amount to significantly more than the recited abstract ideas themselves. Consequently, claims 2-11, 13-16, and 18-20 are also rejected under 35 U.S.C. 101.
Step 1: Claims 2-11 are drawn to a method, claims 13-16 are drawn to a method, and claims 18-20 are drawn to a method. Therefore, each of these claims falls under one of the four categories of statutory subject matter (process/method, machine/apparatus, manufacture/product, or composition of matter).
Step 2A Prong 1: Claims 2-11, 13-16, and 18-20 each recite a judicially recognized exception of an abstract idea.
Claim 2 recites the same judicial exception as claim 1.
Claim 3 recites, inter alia:
functions represent[ing] the complex distributions over the domain – This limitation invokes mathematical functions as a means of representing probabilistic relationships between data (i.e., organizing and manipulating information through mathematical correlations), and therefore recites a mathematical relationship.
transforms the input data to an embedding – This limitation recites a generic transformation of data, and therefore recites either a process of evaluation capable of being performed in the human mind or using pen and paper, or a mathematical relationship (i.e., manipulation of information through mathematical correlations).
wherein the weights are applied to each connection between the input layer, hidden layers of the neural network, and an output layer – This limitation amounts to reciting mathematical relationships between elements of layers of a neural network (i.e., mathematical correlations between neurons).
transforms the embedding at the output layer to an input data space – This limitation recites a transformation of data to a mathematical space, and therefore recites a mathematical relationship (i.e., manipulation of information through mathematical correlations)
Claim 4 recites, inter alia:
wherein the embedding is a vector representation of the input data – This limitation recites a transformation of data to a vector representation, and therefore recites a mathematical relationship (i.e., manipulation of information through mathematical correlations).
Claim 5 recites, inter alia:
wherein the embedding encodes different properties of the input data as a vector of numbers – This limitation recites a transformation of data to a vector representation, and therefore recites a mathematical relationship (i.e., manipulation of information through mathematical correlations).
Claim 6 recites the same judicial exception as claim 3.
Claim 7 recites the same judicial exception as claim 3.
Claim 8 recites, inter alia:
updating the dependency structure – This limitation amounts to merely updating a relationship model based on continued observation of relationships between data points, and therefore recites a process of evaluation capable of being performed in the human mind or using pen and paper.
Claim 9 recites, inter alia:
wherein the dependency structure identifies features in the input data that are directly correlated to one another and identifies the features in the input data that are conditionally independent from one another given other features – This limitation amounts to organizing information in a relationship model (i.e., dependency structure) based on mathematical correlations between data points, and therefore recites a mathematical relationship.
Claim 10 recites the same judicial exception as claim 1.
Claim 11 recites, inter alia:
wherein functions for features of the domain are learned using a loss function comprising regression loss from fit to the input data and structure loss computed as a distance from the dependency structure – This limitation amounts to learning mathematical relationships between data (functions for features of the domain are learned) via mathematical calculations (using a loss function comprising regression loss from fit to the input data, structure loss computed as a distance from a desired dependency structure).
Claim 13 recites the same judicial exception as claim 12.
Claim 14 recites, inter alia:
wherein the inference task predicts unknown values and the set of output values is a set of fixed values or a set of distributions over values – This limitation amounts to making generic predictions about values and probabilities based on observing data, and therefore recites a process of evaluation capable of being performed in the human mind or using pen and paper.
Claim 15 recites, inter alia:
[wherein the inference task uses] a gradient-based approach to determine the unknown values in the set of values for the neural graphical model – This limitation amounts to performing mathematical functions (i.e., calculating gradients) to determine values, and therefore recites a mathematical calculation.
Claim 16 recites, inter alia:
compresses the input data to an embedding – This limitation recites a generic transformation of data, and therefore recites either a process of evaluation capable of being performed in the human mind or using pen and paper, or a mathematical relationship (i.e., manipulation of information through mathematical correlations).
wherein the weights are applied to each connection between the input layer, the layers within the neural network, and an output layer – This limitation amounts to reciting mathematical relationships between elements of layers of a neural network (i.e., mathematical correlations between neurons).
transforms the output embedding at the output layer to an input data space – This limitation recites a transformation of data to a mathematical space, and therefore recites a mathematical relationship (i.e., manipulation of information through mathematical correlations)
Claim 18 recites the same judicial exception as claim 17.
Claim 19 recites, inter alia:
randomly selecting a node in the neural graphical model as a starting node; placing remaining nodes in the neural graphical model in an order relative to the starting node; and creating a value for each node of the remaining nodes in the neural graphical model based on values from neighboring nodes to each node of the remaining nodes by adding random noise to the value created for the node based on a distribution conditioned on the values from the neighboring nodes – These limitations amount to a procedure of organizing and manipulating information through mathematical correlations between nodes (i.e., data points) in a model, and therefore recites a mathematical relationship.
Claim 20 recites, inter alia:
compresses the input data to an embedding – This limitation recites a generic transformation of data, and therefore recites either a process of evaluation capable of being performed in the human mind or using pen and paper, or a mathematical relationship (i.e., manipulation of information through mathematical correlations).
wherein the weights are applied to each connection between the input layer, the layers within the neural network, and an output layer – This limitation amounts to reciting mathematical relationships between elements of layers of a neural network (i.e., mathematical correlations between neurons).
transforms the output embedding at the output layer to an input data – This limitation recites a transformation of data to a mathematical space, and therefore recites a mathematical relationship (i.e., manipulation of information through mathematical correlations).
Step 2A Prong 2: The following additional elements recited in claims 2-3, 6-8, 10-11, 13-16, and 18-20 do not integrate the recited judicial exceptions into a practical application.
Claim 2 additionally recites:
wherein the input data includes real number values, categorical feature values, text input, medical entities, tabular data, time series data, images, captions, objects, videos, audio data, words, phrases, sentences, documents, webpages, or e-mail messages – This limitation amounts to specifying a data source or type of data to be manipulated, and is therefore insignificant extra-solution activity.
Claim 3 additionally recites:
the neural view of the neural graphical model further comprises: an input layer with features of the domain; an encoder; a neural network with multiple layers; weights; bias terms and activation functions; the output layer with the embedding; and a decoder – These limitations are insignificant, high-level implementation steps that merely recite components of a neural architecture, wherein the neural architecture is merely being invoked as a tool to perform an existing abstract idea [see Step 2A Prong 2 analysis of parent claim 1 on page 16]. Consequently, they recite insignificant extra-solution activity.
Claim 6 additionally recites:
wherein a number of nodes in the input layer is based on output units of the encoder – This limitation amounts to an insignificant implementation step with regard to transmitting data through an encoder, wherein the encoder is merely being invoked as a tool to perform an abstract idea. Therefore, it recites insignificant extra-solution activity.
Claim 7 additionally recites:
wherein a first input data type has a first number of nodes in the input layer and a second input data type has a second number of nodes in the input layer different from the first number of nodes – This limitation amounts to an insignificant implementation step with regard to transmitting data through an encoder, wherein the encoder is merely being invoked as a tool to perform an abstract idea. Therefore, it recites insignificant extra-solution activity.
Claim 8 additionally recites:
[updating dependency structure] of the neural view for the number of nodes and corresponding connections between features of the domain – This limitation is a high-level, generic invocation of neural architecture that amounts to mere instructions to apply an exception (i.e., merely invoking neural architecture and its corresponding elements as a tool to perform an abstract idea).
Claim 10 additionally recites:
training the neural view of the neural graphical model – This limitation is a high-level, generic invocation of neural architecture that amounts to mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an abstract idea).
[training] using the combination of different data types of the input data – This limitation amounts to merely specifying a data source or type of data to be manipulated, and is therefore insignificant extra-solution activity.
Claim 11 additionally recites:
[functions are learned] during the training of the neural view – This limitation is a high-level, generic invocation of neural architecture that amounts to mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an abstract idea).
Claim 13 additionally recites:
wherein the input data includes real number values, categorical feature values, text input, medical entities, tabular data, time series data, images, captions, objects, videos, audio data, words, phrases, sentences, documents, webpages, or e-mail messages – This limitation amounts to specifying a data source or type of data to be manipulated, and is therefore insignificant extra-solution activity.
Claim 14 additionally recites:
[predicts unknown values] based on the neural graphical model – This limitation is a high-level, generic invocation of neural architecture that amounts to mere instructions to apply an exception (i.e., merely invoking neural architecture as a tool to perform an abstract idea).
Claim 15 additionally recites:
wherein the inference task uses message passing to determine the unknown values in the set of values for the neural graphical model – This limitation amounts to a high-level, generic invocation of message passing that amounts to no more than mere instructions to apply an exception (i.e., merely invokes message passing as a tool to perform an existing mental process of making predictions).
Claim 16 additionally recites:
wherein the neural view includes: an input layer with features of the domain; an encoder; a neural network with multiple layers; weights; the output layer with an output embedding; and a decoder – These limitations are insignificant, high-level implementation steps that merely recite components of a neural architecture, wherein the neural architecture is merely being invoked as a tool to perform an existing abstract idea [see Step 2A Prong 2 analysis of parent claim 12 on pages 16-17]. Consequently, they recite insignificant extra-solution activity.
Claim 18 additionally recites:
wherein the input data includes real number values, categorical feature values, text input, medical entities, tabular data, time series data, images, captions, objects, videos, audio data, words, phrases, sentences, documents, webpages, or e-mail messages – This limitation amounts to specifying a data source or type of data to be manipulated, and is therefore insignificant extra-solution activity.
Claim 19 additionally recites:
[a node] in the neural graphical model; [remaining nodes] in the neural graphical model – This limitations are high-level, generic invocations of neural architecture that amount to mere instructions to apply an exception (i.e., merely invoking neural architecture and its corresponding elements as a tool to perform an abstract idea).
Claim 20 additionally recites:
wherein the neural view includes: an input layer with features of the domain; an encoder; a neural network with multiple layers; weights; the output layer with an output embedding; and a decoder – These limitations are insignificant, high-level implementation steps that merely recite components of a neural architecture, wherein the neural architecture is merely being invoked as a tool to perform an existing abstract idea [see Step 2A Prong 2 analysis of parent claim 17 on page 17]. Consequently, they recite insignificant extra-solution activity.
Step 2B: The additional elements recited in claims 2-3, 6-8, 10, 13-14, 16, and 18-20, viewed individually or as a combination, do not provide an inventive concept or otherwise amount to significantly more than the recited abstract ideas themselves.
Claim 2 additionally recites:
wherein the input data includes real number values, categorical feature values, text input, medical entities, tabular data, time series data, images, captions, objects, videos, audio data, words, phrases, sentences, documents, webpages, or e-mail messages – Combining data from different modalities (e.g., images and text) to perform inference (i.e., multimodal data fusion) is well-understood, routine, and conventional activity (see Zhang et al., "Multimodal Intelligence: Representation Learning, Information Fusion, and Applications", [page 478 Introduction], [pages 481-483 Fusion]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 3 additionally recites:
wherein the neural graphical model is a probabilistic graphical model, and the neural view of the neural graphical model further comprises: an input layer with features of the domain; an encoder; a neural network with multiple layers; weights; bias terms and activation functions; the output layer with the embedding; and a decoder – Using an encoder-decoder model (i.e.., a model that comprises the elements of the claimed neural architecture – input layer, encoder, neural network, weights, bias terms and activation functions, output layer, decoder) to perform probabilistic inference (e.g., a variational autoencoder (VAE)) is well understood, routine, and conventional activity (see Johnson et al., "Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", [page 3 Variational autoencoders]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 6 additionally recites:
wherein a number of nodes in the input layer is based on output units of the encoder – Transmitting data over a network is well-understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 7 additionally recites:
wherein a first input data type has a first number of nodes in the input layer and a second input data type has a second number of nodes in the input layer different from the first number of nodes – Transmitting data over a network is well understood, routine, and conventional activity (see MPEP § 2106.05(d); “Receiving or transmitting data over a network”) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 8 additionally recites:
[updating dependency structure] of the neural view for the number of nodes and corresponding connections between features – Merely invoking neural architecture and its corresponding elements as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 10 additionally recites:
training the neural view of the neural graphical model – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
[training] using a combination of different data types of the input data – Combining data from different modalities to perform inference (i.e., multimodal data fusion) is well-understood, routine, and conventional activity (see Zhang et al., "Multimodal Intelligence: Representation Learning, Information Fusion, and Applications", [page 478 Introduction], [pages 481-483 Fusion]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 11 additionally recites:
[functions are learned] during the training of the neural view – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 13 additionally recites:
wherein the input data includes real number values, categorical feature values, text input, medical entities, tabular data, time series data, images, captions, objects, videos, audio data, words, phrases, sentences, documents, webpages, or e-mail messages – Combining data from different modalities (e.g., images and text) to perform inference (i.e., multimodal data fusion) is well-understood, routine, and conventional activity (see Zhang et al., "Multimodal Intelligence: Representation Learning, Information Fusion, and Applications", [page 478 Introduction], [pages 481-483 Fusion]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 14 additionally recites:
[predicts unknown values] based on the neural graphical model – Merely invoking neural architecture as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 15 additionally recites:
wherein the inference task uses message passing to determine the unknown values in the set of values for the neural graphical model –Merely invoking message passing as a tool to perform an existing mental process of making predictions does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 16 additionally recites:
wherein the neural view includes: an input layer with features of the domain; an encoder; a neural network with multiple layers; weights; the output layer with an output embedding; and a decoder – Using an encoder-decoder model (i.e.., a model that comprises the elements of the claimed neural architecture – input layer, encoder, neural network, weights, output layer, decoder) to perform probabilistic inference (e.g., a variational autoencoder (VAE)) is well understood, routine, and conventional activity (see Johnson et al., "Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", [page 3 Variational autoencoders]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 18 additionally recites:
wherein the input data includes real number values, categorical feature values, text input, medical entities, tabular data, time series data, images, captions, objects, videos, audio data, words, phrases, sentences, documents, webpages, or e-mail messages – Combining data from different modalities to perform inference (i.e., multimodal data fusion) is well-understood, routine, and conventional activity (see Zhang et al., "Multimodal Intelligence: Representation Learning, Information Fusion, and Applications", [page 478 Introduction], [pages 481-483 Fusion]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 19 additionally recites:
[a node] in the neural graphical model; [remaining nodes] in the neural graphical model – Merely invoking neural architecture and its corresponding elements as a tool to perform an abstract idea does not provide an inventive concept or significantly more to the recited abstract idea.
Claim 20 additionally recites:
wherein the neural view includes: an input layer with features of the domain; an encoder; a neural network with multiple layers; weights; the output layer with an output embedding; and a decoder – Using an encoder-decoder model (i.e.., a model that comprises the elements of the claimed neural architecture – input layer, encoder, neural network, weights, output layer, decoder) to perform probabilistic inference (e.g., a variational autoencoder (VAE)) is well understood, routine, and conventional activity (see Johnson et al., "Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", [page 3 Variational autoencoders]) and therefore does not provide an inventive concept or significantly more to the recited abstract idea.
As such, claims 2-11, 13-16, and 18-20 are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Johnson et al., ("Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders", available arXiv 20 March 2016), hereinafter Johnson, in view of Chang et al., ("Heterogeneous Network Embedding via Deep Architectures", published August 2015), hereinafter Chang.
Regarding claim 1, Johnson teaches A method ("Our approach uses graphical models for representing structured probability distributions, and uses ideas from variational autoencoders (Kingma & Welling, 2014) for learning not only the nonlinear feature manifold but also bottom-up recognition networks to improve inference. Thus our method enables the combination of flexible deep learning feature models with structured Bayesian and even Bayesian nonparametric priors. Our approach yields a single variational inference objective in which all components of the model are learned simultaneously. Furthermore, we develop a scalable fitting algorithm that combines several advances in efficient inference, including stochastic variational inference (Hoffman et al., 2013), graphical model message passing (Koller & Friedman, 2009), and backpropagation with the reparameterization trick (Kingma & Welling, 2014)…We refer to our general approach as the structured variational autoencoder (SVAE). In this paper we illustrate the SVAE using graphical models based on switching linear dynamical systems (SLDS) (Murphy, 2012; Fox et al., 2011)" [Johnson page 2 Introduction]) comprising:
receiving input data generated from a domain; ("In this section we apply the SVAE to both synthetic and real data and demonstrate its ability to learn both rich feature representations and simple latent dynamics. First, we apply a linear dynamical system (LDS) SVAE to synthetic data and illustrate some aspects of its training dynamics…. As a synthetic data example, consider a sequence of 1D images representing a dot bouncing from one side of the image to the other, as shown in the top panel of Figure 8a” [Johnson page 10 Experiments]; The SVAE can receive both synthetic (e.g., images (i.e., generated input data) of a moving dot (i.e., from a domain)) and real data).
identifying a dependency structure for the input data; (“While these images are simple, the task of modeling such sequences captures many salient aspects of the SVAE: to make predictions far into the future with coherent uncertainty estimates, we can use an LDS SVAE to find a low-dimensional latent state space representation, along with a nonlinear image model and a simple model of dynamics in the latent state space" [Johnson pages 10-11 Image of a moving dot]; After receiving input data, the SVAE can output a low-dimensional latent space representation, using a combination of probabilistic graphical model (i.e., LDS/SLDS) and variational autoencoder (i.e., VAE) techniques to learn both feature representation and model dynamics (i.e., dependency structure) in the latent space)
and generating a neural view of a neural graphical model for the domain using the dependency structure (“Furthermore, the functions µ(yn; φ) and Σ(yn; φ) act as a recognition or inference network for observation yn, outputting a distribution over the latent variable xn using a feed-forward neural network applied to yn" [Johnson page 3 Variational autoencoders]; "There are many alternative designs for such recognition networks; for example, the node potential on xt need not depend only on the observation yt. See Section 3.3" [Johnson page 6 Variational family and CRF recognition networks]; "The SVAE construction is general: it can admit many latent probabilistic models as well as many flexible observation models and recognition network designs. In this section we outline some of these possibilities. While the SVAE recognition network described in Section 3.2 only produces node potentials depending on single data points, in general SVAE recognition networks can output potentials on more than one node or take as input more than one data point. For example, a recognition network could output node potentials of the form φ(xt; yt, yt−1) that depend also on the previous data point, as sketched in Figure 6a, or even depend on many data points through a recurrent neural network (RNN). Recognition networks may also output factors on discrete latent variables, as sketched in Figure 6b" [Johnson page 6 Variations and extensions]; The recognition network in SVAE represents the probability distribution over the latent variables and can also model dependencies between variables (i.e., be a dependency structure). SVAE can also be illustrated via graphical models (i.e., views of a neural model) based on an SLDS framework (e.g., see Figure 6(a) and 6(b) [Johnson page 6]). In light of the instant specification, a “neural view” can be reasonably interpreted as merely a representation of the model and its associated functions, thereby functionally equivalent to the model itself), wherein the neural graphical model is a probabilistic graphical model (“To be concrete we focus on a particular generative model for time series based on a switching linear dynamical system (SLDS) (Murphy, 2012; Fox et al., 2011), which illustrates how the SVAE can incorporate both discrete and continuous latent variables with rich probabilistic dependence” [Johnson page 4 Generative model and variational family]; The SVAE model is based on an SLDS framework, wherein SLDS is a type of probabilistic graphical model) that handles complex distributions over the domain and captures a distribution defined by the input data via a latent representation that the neural graphical model uses in learning parameters of the distribution (“A key insight of the variational autoencoder is to use a conditional variational density q(xn j yn), where the parameters of the variational distribution on xn depend on the corresponding data point yn….Thus the variational distribution q(xn j yn) acts like a stochastic encoder from an observation to a distribution over latent variables” [Johnson page 3 Variational autoencoders]; "The SVAE construction is general: it can admit many latent probabilistic models as well as many flexible observation models and recognition network designs. In this section we outline some of these possibilities. While the SVAE recognition network described in Section 3.2 only produces node potentials depending on single data points, in general SVAE recognition networks can output potentials on more than one node or take as input more than one data point. For example, a recognition network could output node potentials of the form φ(xt; yt, yt−1) that depend also on the previous data point, as sketched in Figure 6a, or even depend on many data points through a recurrent neural network (RNN). Recognition networks may also output factors on discrete latent variables, as sketched in Figure 6b" [Johnson page 6 Variations and extensions]; SVAE (and its associated functions) can model, via latent variables, dependencies between variables, including data points that “even depend on many [other] data points” (i.e., complex distributions)),
using the latent representation of the input data by a learning algorithm in training the neural view of the neural graphical model to learn functional dependencies jointly of the dependency structure by sharing parameters across tasks and jointly optimizing regression and structure loss in a multi-task learning framework (“Our approach yields a single variational inference objective in which all components of the model are learned simultaneously. Furthermore, we develop a scalable fitting algorithm that combines several advances in efficient inference, including stochastic variational inference (Hoffman et al., 2013), graphical model message passing (Koller & Friedman, 2009), and backpropagation with the reparameterization trick (Kingma & Welling, 2014). Thus our algorithm can leverage conjugate exponential family structure where it exists to efficiently compute natural gradients with respect to some variational parameters, enabling effective second-order optimization (Martens, 2015), while using backpropagation to compute gradients with respect to all other parameters” [Johnson page 2 Introduction]; “Using
PNG
media_image1.png
43
47
media_image1.png
Greyscale
to denote the variational parameters for the factor
PNG
media_image2.png
35
61
media_image2.png
Greyscale
, the variational parameters are then the encoder parameters
PNG
media_image3.png
31
26
media_image3.png
Greyscale
and the decoder parameters
PNG
media_image4.png
43
42
media_image4.png
Greyscale
, and the objective is
PNG
media_image5.png
118
702
media_image5.png
Greyscale
” [Johnson page 3 Variational autoencoders]; “However, to enable flexible modeling of images and other complex features, we allow the dependence to be a more general nonlinear model. In particular, we consider
PNG
media_image6.png
32
276
media_image6.png
Greyscale
to be MLPs with parameters
PNG
media_image7.png
27
22
media_image7.png
Greyscale
as in Eqs. (10)-(13) of Section 2.2, in Eqs. (10)-(13) of Section 2.2,
PNG
media_image8.png
60
451
media_image8.png
Greyscale
[Johnson page 4 A switching linear dynamical system with nonlinear observations]; “The mean field objective in terms of the global variational parameters
PNG
media_image9.png
53
169
media_image9.png
Greyscale
is then
PNG
media_image10.png
90
702
media_image10.png
Greyscale
” [Johnson page 6 Variational family and CRF recognition networks]; The disclosed variational objective [equation 30] combines training on both reconstructive (i.e., regressive) and structural loss parameters (framework jointly optimizes for both data representation and graph structure))
However, Johnson does not explicitly teach a domain, wherein the input data includes a combination of different data types of the input data and a projection module transforming the input data into a compressed vector representation.
In the same field of endeavor, Chang teaches a method of modeling structural relationships between observed data in an embedding space (“In this paper, we design a deep embedding algorithm for networked data. A highly nonlinear multilayered embedding function is used to capture the complex interactions between the heterogeneous data in a network. Our goal is to create a multi-resolution deep embedding function, that reflects both the local and global network structures, and makes the resulting embedding useful for a variety of data mining tasks. In particular, we demonstrate that the rich content and linkage information in a heterogeneous network can be captured by such an approach, so that similarities among cross-modal data can be measured directly in a common embedding space” [Chang Abstract]) wherein the input data includes a combination of different data types of the input data ("...we present a novel idea on network representation learning, termed Heterogeneous Network Embedding (HNE), which jointly considers both the content as well as the relational information. HNE maps different heterogeneous objects into a unified latent space so that objects from different spaces can be directly compared...Along this line of research, we utilize the network’s inter-connections to design a proper loss function that enforces the similarities between different objects in the embedded feature space such that they are consistent with network links. The idea of a network-preserved embedding is illustrated in figure 2" [Chang page 2 Introduction]; see Figure 2: The flowchart of the proposed Heterogeneous Network Embedding (HNE) framework [Chang page 3]; The HNE can receive heterogeneous data (i.e., different data types) and map the data into a shared embedding space for further comparison/inference) and a projection module transforming the input data into a compressed vector representation (“Furthermore, each node is summarized by unique content information. In particular, images are given as a squared tensor format as Xi P RdIˆdIˆ3 for every vi P VI , while texts are represented by a dT -dimensional feature vector as zj P RdT for all vj P VT” [Chang page 3 Heterogeneous networks]; “Assume that the raw content Xi associated with an image node can be transformed to a d1 I -dimensional vector representation as xi. The conversion of the raw input data into this d1 I -dimensional vector representation will be described in the following subsection. A naive approach to do so is by stacking each column of an image as a vector or through feature machines [7, 23]. It is worth pointing out that, the values of d1 I and dT need not be the same, because images and text are defined in terms of completely different sets of features” [Chang page 4 Latent Embedding in Networks]; “Each input image X P RdIˆdiˆ3 is represented as a 4096-dimensional vector through a series of nonlinear operations in both the training and testing phases” [Chang page 5 The Deep Architecture]; Both text and image input data can be transformed (i.e., embedded) into a low-dimensional vector representation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the input data includ[ing] a combination of different data types of the input data and a projection module transforming the input data into a compressed vector representation as taught by Chang into Johnson because they are both directed towards modeling structural relationships between observed data in an embedding space. Given that multimodal data fusion (i.e., combining different data types) is a well-known concept in the art, a person of ordinary skill would recognize the value of incorporating the embedding techniques of Chang to expand applicability of the SVAE model to modeling dependencies across cross-modal data (e.g., images and text), such that downstream inference tasks could then be performed for multi-modal sources (e.g., domains such as social media) ("Unfortunately, many networked data sources (e.g. Facebook, YouTube, Flickr and Twitter) cannot be naturally represented as vectorized inputs…A significant amount of research has been accomplished on various topics, such as collective classification [37], community detection [44], link prediction [47], social recommendation [35], targeted advertising [22], and so on. The development of a unified representation for networked data in embedded vector form is of great importance to encode both content and links. The basic assumption is that, once the vectorized representation is obtained, the network mining tasks can be readily solved by off-the-shelf machine learning algorithms. Nevertheless, feature learning of networked data is not a trivial task because the data possesses many unique characteristics, such as its size, dynamic nature, noise and heterogeneity…To address the aforementioned challenges, we present a novel idea on network representation learning, termed Heterogeneous Network Embedding (HNE)…” [Chang pages 1-2 Introduction]).
Regarding claim 12, it is a method claim which largely corresponds to the method of claim 1, which is already taught by the combination of Johnson and Chang as detailed above. Johnson further teaches A method ([Johnson page 2 Introduction], as detailed above in claim 1; “Thus the variational distribution q(xn | yn) acts like a stochastic encoder from an observation to a distribution over latent variables, while the forward model p(yn | xn) acts as a stochastic decoder from a latent variable value to a distribution over observations. Furthermore, the functions µ(yn; φ) and Σ(yn; φ) act as a recognition or inference network for observation yn, outputting a distribution over the latent variable xn using a feed-forward neural network applied to yn. See Figure 3 for graphical models of the VAE” [Johnson page 3 Variational autoencoders]) comprising:
receiving a conditional query for a domain;
accessing a neural view of a neural graphical model trained on input data;
using the neural graphical model to perform an inference task to provide an answer to the conditional query; and
outputting a set of values for the neural graphical model based on the inference task for the answer (“Here we describe a structured mean field family with which we can perform variational inference in the posterior distribution of the generative model from Section 3.1. This mean field family illustrates how an SVAE can leverage not only graphical model and exponential family structure but also learn bottom-up inference networks. As we show in Section 4, these structures allow us to compose several efficient inference algorithms including SVI, message passing, backpropagation, and the reparameterization trick [Johnson page 5 Variational family and CRF recognition networks page 5]; “Here we show how to compute stochastic gradients of the SVAE mean field objective using the results of a model inference subroutine. The algorithm is summarized in Algorithm 1…we compute gradients with respect to the recognition network parameters φ. Both the first and second terms of (31) depend on φ, the first term through the sample xˆ (n) (φ) and the second term through the KL divergence KL(φ) , KL(q(θ, z(n) , x(n) ) k p(θ, z(n) , x(n) )). Thus we must differentiate through the procedures that the model inference subroutine uses to compute these quantities. As [Algorithm 1] described in Section 4.2, performing this differentiation efficiently for the SLDS corresponds to backpropagation through message passing” [Johnson pages 7-8 SVAE algorithm]; see Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8]; In light of the instant specification (“One example query is a conditional query. The inference task 40 is given a Xi ( value of a node one of feature 34) of the neural graphical model 16 and predicts the most likely values of the other nodes (features) in the neural graphical model 16 [¶ 0062]), the gradient computation algorithm in SVAE (Algorithm 1) can be interpreted as performing an inference task to provide an answer to a conditional query; the algorithm receives “first term through the sample xˆ (n) (φ)” as a query, and accesses the recognition network (trained on observations yn to output latent distribution xn – see [Johnson page 3 Variational autoencoders] detailed above) to output a set of gradients (i.e., values) with respect to network parameters in response to the query).
Regarding claim 17, it is a method claim which largely corresponds to the method of claim 1, which is already taught by the combination of Johnson and Chang as detailed above. Johnson further teaches A method ([Johnson page 2 Introduction] and [Johnson page 3 Variational autoencoders], as detailed above in clam 12), comprising:
accessing a neural view of a neural graphical model trained on input data for a domain;
using the neural graphical model to perform a sampling task; and
outputting a set of data samples generated by the neural graphical model based on the sampling task. (“Here we describe the model inference subroutine used by the SVAE algorithm….Specifically, using the notation of Section 2.2, in the special case of the VAE the inference subroutine computes KL(q(x | y) k p(x)) in closed form and generates samples x ∼ q(x | y) used to approximate Eq(x) ln p(y | x)… Specifically, the inference subroutine must first optimize the local mean field factors q(z1:T ) and q(x1:T ), then compute and return a sample xˆ1:T ∼ q(x1:T )… The algorithm is summarized in Algorithm 2” [Johnson page 8 Model inference subroutine]; see Algorithm 2 Model inference subroutine for the SLDS [Johnson page 8]; The recognition network (trained on observations yn to output latent distribution xn – see [Johnson page 3 Variational autoencoders] detailed above) is accessed as input and used in the model inference subroutine (see “Input: Variational dynamics parameter ηeθ of q(θ), node potentials {ψ(xt; yt)} T t=1 from recognition network” in lines 1-2 of Algorithm 2)) to output samples (see “return sample xˆ” in line 13 of Algorithm 2))
Regarding claim 2, the combination of Chang and Johnson teaches the limitations of parent claim 1, and Chang further teaches wherein the input data includes real number values, categorical feature values, text input, medical entities, tabular data, time series data, images, captions, objects, videos, audio data, words, phrases, sentences, documents, webpages, or e-mail messages ("An example of a heterogeneous network is illustrated in the left-hand side of Figure 2, which contains two object and three link types. For further ease in understanding, we will assume object types of image (I) and text (T)" [Chang page 3 Heterogeneous Networks]; The data modeled in the HNE framework can be of multiple different data types (e.g., images and text)).
Regarding claims 13 and 18, they are dependent claims that recite substantially similar limitations to those recited in claim 2. The combination of Johnson and Chang also teaches the limitations of parent claims 12 and 17, as detailed above. Consequently, claims 13 and 18 are rejected for the same reasons as claim 2.
Regarding claim 3, the combination of Chang and Johnson teaches the limitations of parent claim 1, and Johnson further teaches wherein functions represent the complex distributions over the domain, ("The SVAE construction is general: it can admit many latent probabilistic models as well as many flexible observation models and recognition network designs. In this section we outline some of these possibilities. While the SVAE recognition network described in Section 3.2 only produces node potentials depending on single data points, in general SVAE recognition networks can output potentials on more than one node or take as input more than one data point. For example, a recognition network could output node potentials of the form φ(xt; yt, yt−1) that depend also on the previous data point, as sketched in Figure 6a, or even depend on many data points through a recurrent neural network (RNN). Recognition networks may also output factors on discrete latent variables, as sketched in Figure 6b" [Johnson page 6 Variations and extensions]; SVAE (and its associated functions) can model dependencies between variables, including data points that “even depend on many [other] data points” (i.e., complex distributions)) and the neural view of the neural graphical model further comprises:
an input layer with features of the domain;
an encoder that transforms the input data to an embedding;
a neural network with multiple layers;
weights, wherein the weights are applied to each connection between the input layer, hidden layers of the neural network, and an output layer;
bias terms and activation functions;
the output layer with the embedding; (“That is, using the fact that the optimal factor q(x1:T ) is Markov according to a chain graph, we write it terms of pairwise potentials and node potentials as [equation 27] where the node potential ψ(xt; yt, φ) is a function of the observation yt. Specifically, we choose each node potential to be a Gaussian factor in which the precision matrix J(yt; φ) and potential vector h(yt; φ) depend on the corresponding observation yt through an MLP, [equation 28] using the notation from Section 2.2” [Johnson pages 5-6 Variational family and CRF recognition networks]; “2.2 Variational autoencoders…Given a high-dimensional dataset y = {yn} N n=1 such as a collection of images, the VAE models each observation yn in terms of a low-dimensional latent variable xn and a nonlinear observation model with parameters ϑ, [equations 8-9] where µ(xn; ϑ) and Σ(xn; ϑ) might depend on xn through a multilayer perceptron (MLP) with L layers [equations 10-13]…Thus the variational distribution q(xn | yn) acts like a stochastic encoder from an observation to a distribution over latent variables,” [Johnson page 3 Variational autoencoders]; The SVAE model can take a set of observations and use, as an encoder, a multi-layer perceptron (MLP) (i.e., type of neural network) comprising L layers (i.e., implicitly comprising input layer, hidden layer(s), and output layer) and its associated functions (including activation functions taking weights/biases of MLP neurons as input – see equation 10 function h`(xn) = f(W`h`l-1(xn) + b`) to generate a latent variable representation (i.e., embedding)
and a decoder that transforms the embedding at the output layer to an input data space. (“Thus the variational distribution q(xn | yn) acts like a stochastic encoder from an observation to a distribution over latent variables, while the forward model p(yn | xn) acts as a stochastic decoder from a latent variable value to a distribution over observations” [Johnson page 3 Variational autoencoders]; “First, Section 3.1 describes the generative model, which illustrates the combination of a graphical model expressing latent structure with a flexible neural net to generate observations…In particular, we consider µ(xt; ϑ) and Σ(xt; ϑ) to be MLPs with parameters ϑ as in Eqs. (10)-(13) of Section 2.2, [equation 23]. By allowing a nonlinear emission model, each low-dimensional state xt can be mapped into a high-dimensional observation yt, and hence the model can represent high-dimensional data in terms of structured dynamics on a low-dimensional manifold”; The generative model (i.e., decoder) can take in latent variables (i.e., embeddings) and map them to observation (i.e., input data) space).
Regarding claims 16 and 20, they are dependent claims that recite substantially similar limitations to those recited in claim 3. The combination of Johnson and Chang also teaches the limitations of parent claims 12 and 17, as detailed above. Consequently, claims 16 and 20 are rejected for the same reasons as claim 3.
Regarding claim 4, the combination of Johnson and Chang teaches the limitations of parent claim 3. While Johnson does not explicitly teach wherein the embedding is a vector representation of the input data (The examiner notes that Johnson could be reasonably interpreted as implicitly teaching the embedding being a vector representation, given that vector elements of node potentials (i.e., embeddings) depend on a corresponding observation yt (i.e., input data) (“Specifically, we choose each node potential to be a Gaussian factor in which the precision matrix J(yt; φ) and potential vector h(yt; φ) depend on the corresponding observation yt through an MLP” [Johnson page 6 Variational family and CRF recognition networks]). For the sake of completeness, Chang is incorporated to explicitly teach a vector representation), Chang further teaches wherein the embedding is a vector representation of the input data (“Furthermore, each node is summarized by unique content information. In particular, images are given as a squared tensor format as Xi P RdIˆdIˆ3 for every vi P VI , while texts are represented by a dT -dimensional feature vector as zj P RdT for all vj P VT” [Chang page 3 Heterogeneous networks]; “Assume that the raw content Xi associated with an image node can be transformed to a d1 I -dimensional vector representation as xi. The conversion of the raw input data into this d1 I -dimensional vector representation will be described in the following subsection. A naive approach to do so is by stacking each column of an image as a vector or through feature machines [7, 23]. It is worth pointing out that, the values of d1 I and dT need not be the same, because images and text are defined in terms of completely different sets of features” [Chang page 4 Latent Embedding in Networks]; “Each input image X P RdIˆdiˆ3 is represented as a 4096-dimensional vector through a series of nonlinear operations in both the training and testing phases” [Chang page 5 The Deep Architecture]; Both text and image input data can be transformed (i.e., embedded) into a vector representation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein the embedding is a vector representation of the input data as taught by Chang into Johnson because they are both directed towards modeling structural relationships between observed data in an embedding space. Given that multimodal data fusion (i.e., combining different data types) is a well-known concept in the art, and that Chang teaches forming unified representations of multimodal data via vector embedding (“The development of a unified representation for networked data in embedded vector form is of great importance to encode both content and links. The basic assumption is that, once the vectorized representation is obtained, the network mining tasks can be readily solved by off-the-shelf machine learning algorithms” [Chang page 1 Introduction]), a person of ordinary skill would recognize the value of incorporating the vector embedding techniques of Chang to expand applicability of the SVAE model to modeling dependencies across cross-modal data (e.g., images and text), such that downstream inference tasks could then be performed for multi-modal sources (e.g., domains such as social media) ([Chang pages 1-2 Introduction], as detailed above in claim 1).
Regarding claim 5, the combination of Johnson and Chang teaches the limitations of parent claim 3. While Johnson does not explicitly teach wherein the embedding encodes different properties of the input data as a vector of numbers, (see examiner note for claim 4 above [page 49]), Chang further teaches wherein the embedding encodes different properties of the input data as a vector of numbers ([Chang page 3 Heterogeneous networks] and [Chang page 4 Latent Embedding in Networks] and [Chang page 5 The Deep Architecture], as detailed above in claim 4).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein the embedding encodes different properties of the input data as a vector of numbers as taught by Chang into Johnson because they are both directed towards modeling structural relationships between observed data in an embedding space. Given that multimodal data fusion (i.e., combining different data types) is a well-known concept in the art, and that Chang teaches forming unified representations of multimodal data via vector embedding ([Chang page 1 Introduction], as detailed above in claim 4), a person of ordinary skill would recognize the value of incorporating the vector embedding techniques of Chang to expand applicability of the SVAE model to modeling dependencies across cross-modal data (e.g., images and text), such that downstream inference tasks could then be performed for multi-modal sources (e.g., domains such as social media) ([Chang pages 1-2 Introduction], as detailed above in claim 1).
Regarding claim 6, the combination of Johnson and Chang teaches the limitations of parent claim 3, and Johnson further teaches wherein a number of nodes in the input layer is based on output units of the encoder ("φ denotes a set of MLP parameters. Thus the variational distribution q(xn | yn) acts like a stochastic encoder from an observation to a distribution over latent variables, while the forward model p(yn | xn) acts as a stochastic decoder from a latent variable value to a distribution over observations. Furthermore, the functions µ(yn; φ) and Σ(yn; φ) act as a recognition or inference network for observation yn, outputting a distribution over the latent variable xn using a feed-forward neural network applied to yn" [Johnson page 3 Variational autoencoders]; “…the node potential ψ(xt; yt, φ) is a function of the observation yt. Specifically, we choose each node potential to be a Gaussian factor in which the precision matrix J(yt; φ) and potential vector h(yt; φ) depend on the corresponding observation yt through an MLP, [equation 28], using the notation from Section 2.2. These local recognition networks allow us to fit a regression from each observation yt to a probabilistic guess at the corresponding latent state xt” [Johnson page 6 Variational family and CRF recognition networks]; In light of the instant specification, the “number of nodes in the input layer [being] based on output units of the encoder" can be interpreted as the number of observations, that are input into the encoder (i.e., via nodes of input layer of encoder), corresponding with the embeddings outputted by the encoder (i.e., via nodes/units of output layer of encoder). In the MLP encoder, each node potential corresponds to an observation, wherein each observation corresponds to a latent variable)
Regarding claim 7, the combination of Johnson and Chang teaches the limitations of parent claim 6, and Johnson further teaches wherein features of an input data type ha[ve] a corresponding number of nodes in the input layer (“Given a high-dimensional dataset y = {yn} N n=1 such as a collection of images, the VAE models each observation yn in terms of a low-dimensional latent variable xn and a nonlinear observation model with parameters ϑ, [equations 8-9] where µ(xn; ϑ) and Σ(xn; ϑ) might depend on xn through a multilayer perceptron (MLP) with L layers [equations 10-13]” [Johnson page 3 Variational autoencoders]; The observations yn (i.e., input data) are input to a multilayer perceptron (MLP) with L layers (i.e., implicitly comprising an input layer corresponding to n number of features/dimensions of observations)). Chang further teaches a first input data type and a second input data type hav[ing] a second number of features different from the first input data type (“It is worth pointing out that, the values of d1 and dT need not be the same, because images and text are defined in terms of completely different sets of features” [Chang page 4 Latent Embedding In Networks].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated a first input data type and a second input data type hav[ing] a second number of features different from the first input data type as taught by Chang into Johnson because they are both directed towards modeling structural relationships between observed data in an embedding space. Given that multimodal data fusion (i.e., combining different data types) is a well-known concept in the art, and that the MLP encoder of SVAE implicitly comprises an input layer corresponding to the number of features of the input data type based on the inherent characteristics of neural networks, a person of ordinary skill would recognize the value of incorporating the vector embedding techniques of Chang to expand applicability of the SVAE model to modeling dependencies across cross-modal data (e.g., images and text), such that downstream inference tasks could then be performed for multi-modal sources (e.g., domains such as social media) ([Chang pages 1-2 Introduction], as detailed above in claim 1).
Regarding claim 8, the combination of Johnson and Chang teaches the limitations of parent claim 6, and Johnson further teaches updating the dependency structure of the neural view for the number of nodes and corresponding connections between the features of the domain (see Figure 8(a) and 8(b) on [Johnson page 12]; "(a) Predictions after 200 training epochs…Predictions after 1100 training epochs" [Johnson page 12]; The SVAE model and associated recognition network (i.e., dependency structure) implicitly update across iterations/epochs).
Regarding claim 9, the combination of Johnson and Chang teaches the limitations of parent claim 1, and Johnson further teaches wherein the dependency structure identifies features in the input data that are directly correlated to one another and identifies the features in the input data that are conditionally independent from one another given other features (“"The SVAE construction is general: it can admit many latent probabilistic models as well as many flexible observation models and recognition network designs. In this section we outline some of these possibilities. While the SVAE recognition network described in Section 3.2 only produces node potentials depending on single data points, in general SVAE recognition networks can output potentials on more than one node or take as input more than one data point. For example, a recognition network could output node potentials of the form φ(xt; yt, yt−1) that depend also on the previous data point, as sketched in Figure 6a, or even depend on many data points through a recurrent neural network (RNN). Recognition networks may also output factors on discrete latent variables, as sketched in Figure 6b" [Johnson page 6 Variations and extensions]; see Figure 6 and Figure 7 on [Johnson page 6]; In light of the instant specification, "features of the input data" can be interpreted as referring to "features of the domain", i.e., observations. Given that the inherent nature of a probabilistic graphical model is to model a probability distribution, the recognition network of SVAE models direct correlations and conditional independence of nodes and their associated observations (e.g., presence and/or absence of edge between nodes))
Regarding claim 10, the combination of Johnson and Chang teaches the limitations of parent claim 1. Johnson further teaches training the neural view of the neural graphical model using the input data (“We trained the LDS SVAE on 80 random image sequences each of length 50, using one sequence per update, and show the model’s future predictions given a prefix of a longer sequence. We used MLP image and recognition models each with one hidden layer of 50 units” [Johnson page 11 Image of a moving dot]), and Chang further teaches the combination of different data types of the input data ([Chang page 2 Introduction] and Figure 2: The flowchart of the proposed Heterogeneous Network Embedding (HNE) framework [Chang page 3], as detailed above in claim 1).
Regarding claim 11, the combination of Johnson and Chang teaches the limitations of parent claim 10, and Johnson further teaches wherein functions for features of the domain are learned during the training of the neural view using a loss function comprising regression loss from fit to the input data and structure loss computed as a distance from the dependency structure ("These local recognition networks allow us to fit a regression from each observation yt to a probabilistic guess at the corresponding latent state xt….There are many alternative designs for such recognition networks; for example, the node potential on xt need not depend only on the observation yt. See Section 3.3." [Johnson page 6 Variational family and CRF recognition networks]; In addition to fitting regression from observation to latent state, the recognition network also can be trained (i.e., use a loss function) to model dependencies between observations (i.e., approximate dependency structure of the input data))
Regarding claim 14, the combination of Johnson and Chang teaches the limitations of parent claim 12, and Johnson further teaches wherein the inference task predicts unknown values based on the neural graphical model and the set of output values is a set of fixed values or a set of distributions over values (”The algorithm is summarized in Algorithm 1…we compute gradients with respect to the recognition network parameters φ” [Johnson page 7 SVAE algorithm]; see Algorithm 1 Computing gradients of the SVAE objective [Johnson page 8]; The algorithm outputs gradients of recognition network parameters, which are fixed inputs – see “recognition network parameters φ” as Input in Algorithm 1)
Regarding claim 15, the combination of Johnson and Chang teaches the limitations of parent claim 14, and Johnson further teaches wherein the inference task uses message passing to determine the unknown values in the set of values for the neural graphical model or a gradient-based approach to determine the unknown values in the set of values for the neural graphical model (“As [Algorithm 1] described in Section 4.2, performing this differentiation efficiently for the SLDS corresponds to backpropagation through message passing” [Johnson pages 7-8 SVAE algorithm])
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Johnson and Chang, as applied to claim 17 above, further in view of Rezende et al., ("Stochastic Backpropagation and Approximate Inference in Deep Generative Models", available arXiv 30 May 2014), hereinafter Rezende.
Regarding claim 19, the combination of Chang and Johnson teaches the limitations of parent claim 17, and Johnson further teaches wherein the sampling task further comprises:
randomly selecting a node in the neural graphical model as a starting node;
placing remaining nodes in the neural graphical model in an order relative to the starting node; and
creating a value for each node of the remaining nodes in the neural graphical model based on values from neighboring nodes to each node of the remaining nodes (“Message passing can also be used to draw samples or compute the log partition function efficiently, as we describe in Section 4.2.2" [Johnson page 9 Optimizing Local Mean Field Factors]; After optimizing the local variational factors, the model inference subroutine uses the optimized factors to draw samples, compute expected sufficient statistics, and compute a KL divergence. The results of these inference computations, which we detail here, are then used to compute gradients of the SVAE objective as described in Section 4.1. To draw S samples {xˆ (s)} S s=1 where xˆ (s) 1:T iid∼ q(x1:T ), we can perform message passing in q(x1:T ), which is a Gaussian distribution that is Markov with respect to a chain graph. Given two neighboring nodes i and j we define the message from i to j as [equation 38] where k is the other neighbor of node i. We can pass messages backward once, running a Kalman filter, and then sample forward S times. That is, after computing the messages mt+1→t(xt) for t = T − 1, . . . , 1, we compute the marginal distribution on x1 as [equation 39] and sample xˆ1 ∼ q(x1). Iterating, given a sample of xˆt−1, we sample xˆt from the conditional distributions [equation 40] thus constructing a joint sample xˆ1:T ∼ q(x1:T )” [Johnson page 9 Samples, Expected Statistics, and KL]; see Figure 13(b) Generated random samples from a VAE fit to mouse depth video on [Johnson page 15]; Samples can be drawn via a message passing procedure of randomly selecting a starting node i and determining values for remaining nodes in joint sample x^1:T via iterating through neighboring nodes in an order relative to starting node i)
However, the combination does not explicitly teach creating a value for each node by adding random noise to the value created for the node based on a distribution conditioned on the values from the neighboring nodes.
In the same field of endeavor, Rezende teaches a method of modeling structural relationships between observed data in an embedding space (“These efforts, combined with the demand for accurate probabilistic inferences and fast simulation, lead us to seek generative models that are i) deep, since hierarchical architectures allow us to capture complex structure in the data, ii) allow for fast sampling of fantasy data from the inferred model, and iii) are computationally tractable and scalable to high-dimensional data” [Rezende page 1 Introduction]) that creat[es] a value for each node by adding random noise to the value created for the node based on a distribution conditioned on the values from the neighboring nodes (“Deep latent Gaussian models (DLGMs) are a general class of deep directed graphical models that consist of Gaussian latent variables at each layer of a processing hierarchy. The model consists of L layers of latent variables. To generate a sample from the model, we begin at the top-most layer (L) by drawing from a Gaussian distribution. The activation hl at any lower layer is formed by a non-linear transformation of the layer above hl+1, perturbed by Gaussian noise. This generative process is described as follows: [see equations 1-4]" [Rezende page 2 Deep Latent Gaussian Models]; "We regularize the recognition model by introducing additional noise, specifically, bit-flip or drop-out noise at the input layer and small additional Gaussian noise to samples from the recognition model" [Rezende page 4 Free Energy Objective]; Gaussian noise (i.e., noise drawn from a Gaussian distribution of latent variables) is added to output from the recognition model when generating samples).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated creating a value for each node by adding random noise to the value created for the node based on a distribution conditioned on the values from the neighboring nodes as taught by Rezende into Johnson because they are both directed towards modeling structural relationships between observed data in an embedding space. Given regularization of deep learning models to prevent overfitting is a well-known concept in the art, incorporating the regularization techniques taught by Rezende would improve capabilities of the SVAE model to provide an accurate approximation of the posterior distribution, and thereby perform accurate inference for unseen data (“To allow for the best possible inference, the specification of the recognition model must be flexible enough to provide an accurate approximation of the posterior distribution…We regularize the recognition model… We found that such regularisation is essential and without it the recognition model is unable to provide accurate inferences for unseen data points” [Rezende page 4 Free Energy Objective]).
Response to Arguments
The remarks filed 09/30/2025 have been fully considered.
Applicant’s remarks traversing the non-eligible subject matter rejections under 35 U.S.C. 101 set forth in the office action mailed 07/02/2025 in view of claims 1-20 as amended, have been considered but are not persuasive.
Applicant alleges that the amended independent claims recite more than high-level, generic invocations, and overall recite elements that both individually and a whole are directed to a practical application. Applicant cites to the specification [¶ 0056-0057] to explain how the recited computational models result in improvements to model runtime.
The examiner respectfully disagrees. As further explained in the rejection set forth above, the newly added limitations still recite high-level features that are common to conventional neural architectures. Further, "claiming the improved speed or efficiency inherent with applying the abstract idea on a computer" does not integrate a judicial exception into a practical application or provide an inventive concept (see MPEP § 2106.05(f)). The new limitations still do not provide adequate detail to clearly set forth an improvement to conventional technology or a technical field (such as an improvement to the underlying model itself).
Applicant has not presented further arguments with respect to the dependent claims. As such, amended claims 1-20 stand rejected under 35 U.S.C. 101.
Applicant’s remarks [Remarks pages 11-13] traversing the obviousness rejections under 35 U.S.C. 103 set forth in the office action mailed 07/02/2025 in view of claims 1-20 as amended, have been considered but are not persuasive. Applicant is directed towards the 103 rejection set forth above, which explains how the combination of Johnson and Chang are interpreted to still teach or suggest the limitations at issue. Applicant has not presented further arguments with respect to the dependent claims. As such, amended claims 1-20 stand rejected under 35 U.S.C. 103.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Wang et al. (“Multi-Task Learning Based Network Embedding”, published 01/14/2020) discloses a multi-task learning framework wherein network structure information is encoded into a continuous low-dimensionality embedding space such that geometric relationships among the vectors can reflect the relationships of nodes in the original network.
Zhang et al. (“A Survey on Multi-Task Learning”, published 31 Mar 2021) discloses a survey of multi-task learning paradigms being combined with other ML concepts, including graphical models.
Mahdavi et al. (“Dynamic Joint Variational Graph Autoencoders”, published 2020) discloses a DynVGAE framework that can learn both local structures and temporal evolutionary patterns in a dynamic network.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY M BALAKRISHNAN whose telephone number is (571) 272-0455. The examiner can normally be reached 10am-5pm EST Mon-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/V.M.B./
Examiner, Art Unit 2143
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143