Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
2. This Office Action is sent in response to Applicant’s Communication received on 11/5/2025 for application number 17/829,142.
Response to Amendments
3. The Amendment filed 11/5/2025 has been entered. Claims 1, 6, 7, 12 and 16-17 have been amended. Claims 1-20 remain pending in the application.
Response to Arguments
Step 2A, Prong 1:
Applicant argues that the claim does not recite any of the judicial exceptions enumerated in the 2019 PEG. For instance, the claim does not recite any mathematical relationships, formulas, or calculations. While some of the limitations may be based on mathematical concepts, the mathematical concepts are not recited in the claims. Further, the claim does not recite a mental process because the steps are not practically performed in the human mind. Finally, the claim does not recite any method of organizing human activity such as a fundamental economic concept or managing interactions between people. Thus, the claim is eligible because it does not recite a judicial exception.
The Examiner respectfully disagrees, the claim recites a “Probabilistic Graphical Model … comprises parameters defining dependencies between variables; and performing an embedding training … to obtain an enhanced Probabilistic Graphical Model comprising parameters defining dependencies …”. These limitations set forth mathematical/statistical modeling and parameter learning/optimization, which fall within the mathematical concepts grouping of abstract ideas, even though the particular formulas are not spelled out. Accordingly, claim 1 recites a judicial exception (mathematical concepts) under Step 2A, Prong one. Thus, claim 1 is not closely analogous to Example 39 for Step 2A, Prong One for the stated reason above.
Claim Rejections:
Applicant argues that ZHANG meanwhile relates solely to Neural Networks, and makes no mention of Probabilistic Graphical Models.
Examiner respectfully disagrees and notes that Zhang teaches a machine learning model architecture (including variational autoencoders and Bayesian learning techniques) in which weights and latent variable are modeled as probabilistic distributions (para. [0049). Thus, Zhang teaches probabilistic model with dependency parameters, which falls within the claim’s scope under BRI.
Applicant argues that Bayesian Neural Networks (BNN) are not Bayesian Networks (a member of the PGM family). BNNs are Neural Networks that are trained via Bayesian Learning processes. The BNNs do not support general inference over any variable in the model, like Bayesian Networks and other PGMs do.
Examiner respectfully disagrees and notes that the claim recites only a probabilistic model with variables and dependency parameters. The claim does not exclude neural-network-based probabilistic models. BNNs, as taught in Zhang, perform probabilistic modeling of weights and activations, and are trained using Bayesian methods that infer uncertainty and conditional dependence across variables. These features satisfy the functional requirements of the claimed model.
Applicant argues that Zhang adds new output/query variables while the present invention extends the model via input features.
Examiner respectfully disagrees and notes that claim 1 recites “extending said Probabilistic Graphical Model to comprise one or more Extension variables, each said Extension variable corresponding to the outputs of said machine learning model”. This language does not restrict the added variables to input features, nor does it preclude extending the model with variables corresponding to output predictions from a separate model. In fact, the claim language broadly encompasses any added variables that correspond to machine learning model outputs, which includes what Zhang describes by extending the decoder with a new head and generating new parameterized outputs from a context-ware model. Therefore, the distinction between input and output variable extension is not a limiting feature of the claim.
Applicant argues that Zhang does not disclose embedding training of an extended PGM. However, Zhang provides multiple teachings that align with this limitation. Zhang describes generating context vectors via set encoders and neural networks to encode the observed context in a latent embedding space. The model is augmented with new heads and new parameters for new features. Embedding representations are used to optimize model behavior via likelihood-based training (para. [0088-0097]).
Accordingly, the arguments are not persuasive, and the rejection is maintained.
Claim Interpretation - 35 USC § 112(f)
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim Rejections - 35 USC § 101
4. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to the abstract idea without significantly more.
Step 1, the claims are directed to the statutory categories of a process, machine, and manufacture.
Step 2A Prong 1, Claim 1 recites, in part, a Probabilistic Graphical Model comprising a set of variables comprising a first set of Observable variables (Var1, Var2, Var3, ..., VarN), and a class variable, whereby said probabilistic model comprises parameters defining dependencies between the variables of said set of variables, extending said Probabilistic Graphical Model to comprise one or more Extension variables (VarVarX1, VarX2 ..., VarXN), each said Extension variable corresponding to the outputs of said model, and performing an embedding training of said extended Probabilistic Graphical Model on the basis of an embedding training set of data, said embedding training set comprising first data (D1.1) of data from said specified context (Cl) and an inferred model output (01.2) inferred by said model from third data (D1.2) from context C1 corresponding to said second set of Observable variables (VarA, VarB ...VarZ), whereby third data (D1.2) is sampled from said context (Cl) together with said first data (D1.1), to obtain an enhanced Probabilistic Graphical Model comprising parameters defining dependencies between said Observable variables, said class variable and each said Extension variable. These operations, model construction, statistical inference, and probabilistic parameter learning are all abstract mathematical manipulations. That is, other than reciting “computer” in the context of the claims, the limitations are directed to “Mathematical Concept” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
Step 2A Prong 2, this judicial exception is not integrated into a practical application. In particular, the claims recite the additional elements of “computer”. The computer components in the claim are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Please see MPEP §2106.04.(a)(2).III.C. The claims also recite the additional element of “machine learning model” and “obtaining/sampling training data”. The steps of obtaining/sampling training data function as data gathering and output activity at high level of generality in service of the abstract idea, which is insignificant extra-solution activity. Thus, the additional elements in the claims merely used as a tool to implement the abstract idea.
Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “computer”, “machine learning model”, and “obtaining/sampling training data” to perform the steps of the claims amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Please see MPEP §2106.05(b) and (g). The claim is not patent eligible.
Claims 2-20 provide further limitations to the abstract idea as rejected above, however, they do not disclose any additional elements that would amount to a practical application or significantly more than an abstract idea.
Claims 19 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 19 recites a “computer program” comprising instructions. The claim fails to define any structure or hardware. Accordingly, the recited “computer program” of Claim 19 is a computer software per se and is not a “process,” a “machine,” a “manufacture” or a “composition of matter,” as defined in 35 U.S.C. 101.
Claim 20 recites “a computer-readable medium”, however, the specification is silent of the recited “a machine-readable medium”. However, paragraph [0111] of the specification describes a computer-usable or computer-readable can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. It appears that the recited “machine-readable medium” can be intended to be directed towards physical mediums or transitory mediums. Thus, it is believed that the limitation “machine readable medium” is intended to claim something broader than the disclosed storage media and cover signals, waves and other forms of transmission media that carry instructions. Therefore, the limitation “machine-readable medium” is not limited to physical articles or objects which constitute a manufacture within the meaning of 35 USC 101 and enable any functionality of the instructions carried thereby to act as a computer component and realize their functionality. As such, the claim is not limited to statutory subject matter and is therefore non-statutory
Claim Rejections - 35 USC § 102
5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
6. Claims 1-4, 8, 10-11, 15, and 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhang et al. (U.S. Patent Application Pub. No. US 20220147818 A1).
Claim 1: Zhang teaches a method of building a computer implemented data classifier for classifying data from a specified context (Cl) (i.e. FIG. 2 illustrates an example computing apparatus 200 for implementing an artificial intelligence (AI) algorithm including a machine-learning (ML) model; para. [0040, 0046, 0071, 0073]), said method comprising the steps of: obtaining a Probabilistic Graphical Model comprising a set of variables comprising a first set of Observable variables (Var1, Var2, Var3, ..., VarN), and a class variable (i.e. Each weight could simply be a scalar value. Alternatively, as shown in FIG. 1B, at some or all of the nodes 104 in the network 101, the respective weight may be modelled as a probabilistic distribution such as a Gaussian. In such cases the neural network 101 is sometimes referred to as a Bayesian neural network. Optionally, the value input/output on each of some or all of the edges 106 may each also be modelled as a respective probabilistic distribution. For any given weight or edge, the distribution may be modelled in terms of a set of samples of the distribution, or a set of parameters parameterizing the respective distribution, e.g. a pair of parameters specifying its center point and width (e.g. in terms of its mean μ and standard deviation σ or variance σ2). The value of the edge or weight may be a random sample from the distribution. The learning or the weights may comprise tuning one or more of the parameters of each distribution; para. [0005, 0049]), whereby said probabilistic model comprises parameters defining dependencies between the variables of said set of variables (i.e. The input feature vector X comprises a plurality of elements xd, each representing a different feature d=0, 1, 2, . . . etc. E.g. in the example of image recognition, each element of the feature vector X may represent a respective pixel value; para. [0052-0054], these paragraphs described architecture uses nodes, weighted edges, probabilistic connections, which meets the description of a PGM), obtaining a machine learning model that is trained on second training data (D2) (i.e. The network learns by operating on data input at the input layer, and adjusting the weights applied by some or all of the nodes based on the input data; para. [0006, 0010], describes training ML models on datasets), comprising a second set of Observable variables (VarA, VarB ...VarZ) (i.e. The auxiliary model 700 is trained such that the predicted new parameters generate accurate values of the new features. A discussion of said training is provided below. For now, suffice is to say that the primary model is trained first, the representation vectors are then extracted from the trained primary model and supplied to the auxiliary model 700, and then the auxiliary model 700 is trained to predict the new parameters. The new parameters are then supplied to the primary model. The existing parameters of the primary model remain unchanged; para. [0015, 0064]), extending said Probabilistic Graphical Model to comprise one or more Extension variables (VarVarX1, VarX2 ..., VarXN), each said Extension variable corresponding to outputs of said machine learning model (i.e. we apply a CHN to a partial variational autoencoder (P-VAE) as an exemplar model. This is a flexible autoencoder model that is able to accurately work with and impute missing values in data points, allowing us to model sparsely-observed data such as that found in recommender systems. For each new feature n, we augment the P-VAE's decoder with a new decoder head consisting of an additional column of decoder weights wn and an additional decoder bias term bn which extend the model's output to the new feature, so that θn={wn, bn}. See FIG. 13 for an illustration. Where multiple baselines are considered at meta-test time, these are all applied to the same trained PVAE model to ensure a fair comparison between methods; para. [0064, 0091, 0105], adding new variables/features to an existing probabilistic model), and performing an embedding training of said extended Probabilistic Graphical Model on a basis of an embedding training set of data (i.e. We adopt a meta-learning approach to training the CHN, treating each new feature as an individual task with the aim of producing a model that can “learn how to learn”. We assume a base model pθ o (xU|xO) is trained on the data observed before the adaptation stages. The base model is then frozen during CHN training. To implement the training strategy, in the experiments we divide the dataset into three disjoint sets of features (see FIG. 12): a ‘training’ set for base model training in the first stage, a ‘meta-training’ set for CHN meta-learning in the second stage, and a meta-test set for CHN evaluation in the third stage; para. [0092, 0093, 0096, 0098], computing context embeddings and optimizing likelihoods using the extended model), said embedding training set comprising first training data (D1.1) of data from said specified context (Cl) and an inferred machine learning model output (01.2) inferred by said machine learning model from third training data (D1.2) from context C1 corresponding to said second set of Observable variables (VarA, VarB ...VarZ), whereby third training data (D1.2) is sampled from said context (Cl) together with said first training data (D1.1) (i.e. For each feature, sample kn data points in which this feature is observed to form the context set; para. [0075, 0077, 0094], combining observed data with inferred model outputs), to obtain an enhanced Probabilistic Graphical Model comprising parameters defining dependencies (i.e. These parameters are then used to make predictions for all of the target set values for the new features; para. [0091, 0095, 0099], the enhanced model with new dependency parameters) between said Observable variables, said class variable and each said Extension variable (i.e. For each feature n in
PNG
media_image1.png
42
29
media_image1.png
Greyscale
, sample kn data points in which this feature is observed to form the context set
PNG
media_image2.png
38
42
media_image2.png
Greyscale
, and reveal the associated feature values to the model. In our experiments we sample kn˜Uniform[0, . . . , 32] to ensure that a single CHN can perform well across a range of context set sizes; para. [0094-0096]).
Claim 2: Zhang teaches the method of claim 1. Zhang further teaches wherein one or more Observable variables (Var1, Var2, Var3, ..., VarN) of a said Probabilistic Graphical Model are directly dependent on the said class variable, and one or more Latent variables (i.e. An example of this is an auto encoder, as illustrated by way of example in FIGS. 4A-D. In an auto encoder, an encoder network is arranged to encode an observed input vector Xo into a latent vector Z, and a decoder network is arranged to decode the latent vector back into the real-world feature space of the input vector; para. [0010, 0012, 0091, 0105]).
Claim 3: Zhang teaches the method of claim 1. Zhang further teaches wherein said Probabilistic Graphical Model is extended with one or more Extension variables (VarVarX1, VarX2 ..., VarXN), whereby said Extension variables are directly dependent on the said class variable, one or more Latent variables and possibly one or more said Observable variables (i.e. we apply a CHN to a partial variational autoencoder (P-VAE) as an exemplar model. This is a flexible autoencoder model that is able to accurately work with and impute missing values in data points, allowing us to model sparsely-observed data such as that found in recommender systems. For each new feature n, we augment the P-VAE's decoder with a new decoder head consisting of an additional column of decoder weights wn and an additional decoder bias term bn which extend the model's output to the new feature, so that θn={wn, bn}. See FIG. 13 for an illustration. Where multiple baselines are considered at meta-test time, these are all applied to the same trained PVAE model to ensure a fair comparison between methods; para. [0085-0089, 0105]).
Claim 4: Zhang teaches the method of claim 1. Zhang further teaches wherein said step of obtaining a Probabilistic Graphical Model comprises training said Probabilistic Graphical Model with said first training data (D1.1) from said specified context (Cl), said first training data comprising data corresponding to a first set of one or more Observable variables (Var1, Var2, Var3, ..., VarN) (i.e. We assume a base model pθ o (xU|xO) is trained on the data observed before the adaptation stages. The base model is then frozen during CHN training. To implement the training strategy, in the experiments we divide the dataset into three disjoint sets of features (see FIG. 12): a ‘training’ set for base model training in the first stage, a ‘meta-training’ set for CHN meta-learning in the second stage, and a meta-test set for CHN evaluation in the third stage; para. [0092]), whereby embedding training using said embedding training set modifies only the parameters corresponding to the dependencies between the said Extension variables and other variables in said extended Probabilistic Graphical Model (i.e. in deep neural networks, the number of model parameters θ may be extremely large, so that maximizing this log-likelihood is very expensive, particularly if new features are being introduced on a regular basis. Furthermore, optimizing θ for one particular feature may lead to poor performance for another, as is the case in catastrophic forgetting in continual learning tasks. In order to address both of these concerns, we divide the model parameters into parameters θ0 inherent from the old model, and feature-specific parameters θn associated solely with the new feature. In other words, we use pθ o (xU|xO) as a base model and pose a factorization assumption on the augmented model aspθ(|xO)=pθ o (xU|xO)pθ o (xn|xO; θn), which together yield a predictive model for the new feature. We then hold θ0 fixed and only seek MLEs for θn; para. [0064, 0081, 0093]).
Claim 8: Zhang teaches the method of claim 1. Zhang further teaches whereby said Extension variables are directly dependent on the said class variable (i.e. Once trained, the auto encoder can be used to impute missing values from a subsequently observed feature vector Xo. Alternatively or additionally, a third network can be trained to predict a classification Y from the latent vector, and then once trained, used to predict the classification of a subsequent, unlabelled observation; para. [0012, 0105]).
Claim 10: Zhang teaches the method of claim 1. Zhang further teaches wherein said step of training said machine learning model comprises incorporating said machine learning model as a Latent representation of an autoencoder (i.e. An example of this is an auto encoder, as illustrated by way of example in FIGS. 4A-D. In an auto encoder, an encoder network is arranged to encode an observed input vector Xo into a latent vector Z, and a decoder network is arranged to decode the latent vector back into the real-world feature space of the input vector; para. [0010, 0012]).
Claim 11: Zhang teaches the method of claim 1. Zhang further teaches wherein said machine learning model is trained in an unsupervised mode (i.e. another example is an unsupervised approach where input data points are not labelled at all and the learning algorithm is instead left to infer its own structure in the experience data; para. [0009, 0010]).
Claim 15: Zhang teaches a method of classifying data comprising presenting said data to a classifier (i.e. the auto encoder can be used to impute missing values from a subsequently observed feature vector Xo. Alternatively or additionally, a third network can be trained to predict a classification Y from the latent vector, and then once trained, used to predict the classification of a subsequent, unlabelled observation; para. [0012, 0064]) in accordance with claim 1 (see rejection of claim 1 above).
Claim 18: Zhang teaches a data processing system comprising means for carrying out the method of claim 1 (see rejection of claim 1 above).
Claim 19: Zhang teaches a computer program comprising instructions which, when the program is executed by a computer (i.e. there is provided a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model, wherein the primary model is configured to transform from an observed subset of a set of real-world features to a predicted version of the set of real-world features; para. [0016]), cause the computer to carry out the method of claim 1 (see rejection of claim 1 above).
Claim 20: Zhang teaches a computer-readable medium (i.e. The storage on which the code is stored may comprise one or more memory devices employing one or more memory media; para. [0042]) comprising instructions which, when executed by a computer (i.e. there is provided a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model, wherein the primary model is configured to transform from an observed subset of a set of real-world features to a predicted version of the set of real-world features; para. [0016]), cause the computer to carry out the method of claim 1 (see rejection of claim 1 above).
Claim Rejections – 35 USC § 103
7. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
8. Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Lillo (U.S. Patent Application Pub. No. US 20220284261 A1).
Claim 5: Zhang teaches the method of claim 1. Zhang does not explicitly teach said embedding training set modifies all parameters.
However, Lillo teaches wherein embedding training using said embedding training set modifies all parameters corresponding to the dependencies between all variables in said extended Probabilistic Graphical Model (i.e. the modification of parameter values may be performed through a process referred to as “back propagation.” Back propagation includes determining the difference between the expected model output (e.g., the reference data output vectors 122) and the obtained model output (e.g., output vectors 118), and then determining how to modify the values of some or all parameters of the model to reduce the difference between the expected model output and the obtained model output; para. [0022, 0043]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Lillo. One would have been motivated to make this modification because it ensures every parameter is adjusted to optimize prediction accuracy.
9. Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Kakui (U.S. Patent Application Pub. No. US 20150254125 A1).
Claim 6: Zhang teaches the method of claim 1. Zhang further teaches wherein there are provided one or more further machine learning models each said further machine learning model comprising said second set of Observable variables (VarA, VarB, ...,VarZ), and each said further machine learning model output 01.2 comprising probabilities corresponding to values of said Extension variables (VarVarX1, VarX2, ..., VarXN) (i.e. To address these problems, we introduce a Contextual HyperNetwork (CHN) Hψ
PNG
media_image3.png
38
42
media_image3.png
Greyscale
,
PNG
media_image4.png
38
63
media_image4.png
Greyscale
), an auxiliary neural network that amortizes the process of estimating θn. The goal is that after training the CHN and when a new feature x_{n{circumflex over ( )}*} is added at test time, the CHN will directly generate “good” parameters such that the new predictive model pθ o (xn*|xO; θn*=
PNG
media_image5.png
38
38
media_image5.png
Greyscale
; para. [0085]) can predict the values of the new feature accurately) of said extended Probabilistic Graphical Model (i.e. FIG. 7 illustrates an example auxiliary model 700. In general the auxiliary model 700 comprises a first neural network 701 and a second neural network 702. The auxiliary model 700 may also comprise a third neural network 801 as shown in FIG. 8. Note that one or more of the neural networks may themselves comprise more than one neural network and/or other functions. For instance, in some examples the first neural network may comprise two sub-networks; para. [0059]), and wherein said step of performing an embedding training of said extended Probabilistic Graphical Model is performed, such that probability tables of said Extension variables (VarVarX1, VarX2 ..., VarXN) are obtained (i.e. Each weight could simply be a scalar value. Alternatively, as shown in FIG. 1B, at some or all of the nodes 104 in the network 101, the respective weight may be modelled as a probabilistic distribution such as a Gaussian. In such cases the neural network 101 is sometimes referred to as a Bayesian neural network. Optionally, the value input/output on each of some or all of the edges 106 may each also be modelled as a respective probabilistic distribution. For any given weight or edge, the distribution may be modelled in terms of a set of samples of the distribution, or a set of parameters parameterizing the respective distribution, e.g. a pair of parameters specifying its centre point and width (e.g. in terms of its mean μ and standard deviation σ or variance σ2). The value of the edge or weight may be a random sample from the distribution. The learning or the weights may comprise tuning one or more of the parameters of each distribution; para. [0049]).
Zhang does not explicitly teach conditional probability.
However, Kakui teaches conditional probability tables (i.e. The Bayesian network is a probabilistic model that is configured from a directed acyclic graph that uses a plurality of random variables as the nodes, and conditional probability tables or conditional probability density functions of the respective variables based on the dependency between the nodes represented by the graph, and can be created based on statistical learning. In particular, to use the observed data of a variable and determining the structure of a directed acyclic graph is referred to as structural learning, and the generation of parameters of the conditional probability table or the conditional probability density function of the respective nodes of the graph is referred to as parameter learning; para. [0113]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Kakui. One would have been motivated to make this modification because it improves model interpretability.
10. Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Vaske et al. (U.S. Patent Application Pub. No. US 20150262082 A1).
Claim 7: Zhang teaches the method of claim 1. Zhang further teaches wherein there are provided one or more further machine learning models each said further machine learning model comprising said second set of Observable variables (VarA, VarB,...,VarZ), and each said further machine learning model output 01.2 comprising values that are not probabilities, said values corresponding to states of Observed Extension Variables, a subset of said Extension variables (VarVarX1, VarX2 ..., VarXN), whereas the rest of said Extension variables are Latent Extension Variables (i.e. FIG. 7 illustrates an example auxiliary model 700. In general the auxiliary model 700 comprises a first neural network 701 and a second neural network 702. The auxiliary model 700 may also comprise a third neural network 801 as shown in FIG. 8. Note that one or more of the neural networks may themselves comprise more than one neural network and/or other functions. For instance, in some examples the first neural network may comprise two sub-networks; para. [0059]), wherein said Observed Extension Variables are conditioned on said Latent Extension Variables, and said step of performing an embedding training of a Probabilistic Graphical Model (i.e. To address these problems, we introduce a Contextual HyperNetwork (CHN) Hψ
PNG
media_image3.png
38
42
media_image3.png
Greyscale
,
PNG
media_image4.png
38
63
media_image4.png
Greyscale
), an auxiliary neural network that amortizes the process of estimating θn. The goal is that after training the CHN and when a new feature x_{n{circumflex over ( )}*} is added at test time, the CHN will directly generate “good” parameters such that the new predictive model pθ o (xn*|xO; θn*=
PNG
media_image5.png
38
38
media_image5.png
Greyscale
; para. [0085]) is performed such that for each said Observed Extension Variable and each said Latent Extension Variable a specific probability table is obtained.
Zhang does not explicitly teach Latent Extension Variable a specific probability table is obtained.
However, Vaske teaches Latent Extension Variable a specific probability table is obtained (i.e. Causal interactions that change the state of molecules (e.g. gene transcriptional regulation, protein phosphorylation, complex formation) are represented as directed edges from the regulating variable to the regulated variable. Therefore, for each variable Y in the probabilistic graph of the model, a factor is introduced into a joint probability model that relates the state of the variable to the state of all its regulators: F(Y|X.sub.1, X.sub.2, . . . , X.sub.N), where X.sub.1 through X.sub.N are the variables that regulate Y. This factor is a conditional probability table: for each setting of Parents(Y), .SIGMA.y.epsilon. F(Y=y|Parents(Y))=1. Observations of individual variables, such as the genome copy number or gene expression, are modeled as separate variables, connected to the latent variable by a factor F(Y|X), also a conditional probability table; para. [0050]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Vaske. One would have been motivated to make this modification because making its inference process more explainable.
11. Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Kushner et al. (U.S. Patent Pub. No. US 10963960 B1).
Claim 9: Zhang teaches the method of claim 1. Zhang does not explicitly teach wherein said second training data (D2) belongs to said specified context (C1).
However, Kushner teaches wherein said second training data (D2) belongs to said specified context (C1) (i.e. reallocation model 44 may be periodically retrained by training unit 34. For example, reallocation model 44 may be initially created by training unit 34 based on training data 35 including a first training data set (e.g., a first set of user data 38, credit usage history 40, context data 42, and/or corporation data 46), and training unit 34 may retrain reallocation model 44 when appropriate using a second training data set (e.g., a second set of user data 38, credit usage history 40, context data 42 and/or corporation data 46). In some examples, the second training data set may include some or all of the training data included in the first training data set with at least some data not included in the first training data set. In other examples, the second training data set may not include any training data of the first training data set (e.g., all new training data). In any case, training unit 34 may retrain reallocation model 44 using the second training data set. After reallocation model 44 has been retrained, reallocation model 44 may be different that the previous reallocation model trained using the first training data set; col.12, lines 40-59).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Kushner. One would have been motivated to make this modification because making it enhances relevance in inference tasks.
12. Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Dao (U.S. Patent Application Pub. No. US 20250056324 A1).
Claim 12: Zhang teaches the method of claim 1. Zhang does not explicitly teach wherein said context comprises conditions corresponding to geographic locations, time and a type of moving entities in a physical space under which the data is sampled.
However, Dao teaches wherein said context comprises conditions corresponding to geographic locations, time and a type of moving entities in a physical space under which the data is sampled (i.e. the present disclosure provides a method by a data processing function (DPF) in a core network (CN) of a mobile network. The method includes receiving, from a data consumer function (DCF) in the CN of the mobile network, a request for data and selecting, based on the request for data, a data source from one or more of: an access network (AN), a network function (NF) in the AN, and an electronic device (ED). The method also includes transmitting the request for data to the selected data source, receiving data based on the transmitted request for data, and sending the received data to the DCF. Receiving a request for data may include receiving a request for data including one or more of: an identifier (ID) of the DCF, a service request ID associated with the request, a geographic location, a type of object to be detected, a type of data, an observation time window, a sampling time duration, and a moving direction of object. The method may allow a function in the CN, such as a data consumer function, to request for or subscribe to one or more types of data in one or more of: another network, an AN, NF and ED; para. [0023]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Dao. One would have been motivated to make this modification because making it improves context sensitivity.
13. Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Levine (U.S. Patent Application Pub. No. US 20220350365 A1).
Claim 13: Zhang teaches the method of claim 1. Zhang does not explicitly teach wherein said training data comprise kinematic data for moving entities in a physical space.
However, Levine teaches wherein said training data comprise kinematic data for moving entities in a physical space (i.e. The offboard device processor 72 of the offboard computing device 70 may be further configured to receive a plurality of first ground-truth velocity measurements 150 from the training wearable computing device 110. The plurality of first ground-truth velocity measurements 150 may be paired with respective first training kinematic measurements 142 of the plurality of first training kinematic measurements 142, thereby forming a first training data set 152. The plurality of first ground-truth velocity measurements 150 may be collected at least in part via GPS using the training GPS receiver 130. Additionally or alternatively, the plurality of first ground-truth velocity measurements 150 may be collected at least in part by performing visual simultaneous localization and mapping (SLAM) on imaging data collected at the one or more training outward-facing optical sensors 122. In some examples, sensor fusion may be performed on GPS data and imaging data to determine the plurality of first ground-truth velocity measurements 150; para. [0045-0051]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Levine. One would have been motivated to make this modification because making it improves predictive accuracy for movement-based scenarios.
Claim 14: Zhang and Levine teach the method of claim 13. Zhang does not explicitly teach wherein said training data further comprise images, video streams, sound or electromagnetic signatures.
However, Levine further teaches wherein said training data further comprise images, video streams, sound or electromagnetic signatures (i.e. The offboard device processor 72 of the offboard computing device 70 may be further configured to receive a plurality of first ground-truth velocity measurements 150 from the training wearable computing device 110. The plurality of first ground-truth velocity measurements 150 may be paired with respective first training kinematic measurements 142 of the plurality of first training kinematic measurements 142, thereby forming a first training data set 152. The plurality of first ground-truth velocity measurements 150 may be collected at least in part via GPS using the training GPS receiver 130. Additionally or alternatively, the plurality of first ground-truth velocity measurements 150 may be collected at least in part by performing visual simultaneous localization and mapping (SLAM) on imaging data collected at the one or more training outward-facing optical sensors 122. In some examples, sensor fusion may be performed on GPS data and imaging data to determine the plurality of first ground-truth velocity measurements 150; para. [0045-0051]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Levine. One would have been motivated to make this modification because making it improves predictive accuracy for movement-based scenarios.
14. Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Goulding (U.S. Patent Application Pub. No. US 20110231016 A1).
Claim 16: Zhang teaches the method of claim 1. Zhang does not explicitly teach applied to classification of targets in combat management systems, or in processing of sensor observations or detections of moving targets, wherein the first set of Observable variables (Var1, Var2, Var3, ..., VarN) and the second set of Observable variables (VarA, VarB, VarC, ..., VarZ) correspond to (i) the outputs of various sensors perceiving the targets and (ii) outputs of sources describing environmental conditions and wherein the dependencies between said Observable variables, said class variable and each said Extension variable describe correlations between the context, the observations and a target class, enabling classification of a target, prediction of its states or detection of anomalous target states.
However, Goulding teaches applied to classification of targets in combat management systems, or in processing of sensor observations or detections of moving targets, wherein the first set of Observable variables (Var1, Var2, Var3, ..., VarN) and the second set of Observable variables (VarA, VarB, VarC, ..., VarZ) correspond to (i) the outputs of various sensors perceiving the targets and (ii) outputs of sources describing environmental conditions and wherein the dependencies between said Observable variables, said class variable and each said Extension variable describe correlations between the context, the observations and a target class, enabling classification of a target, prediction of its states or detection of anomalous target states (i.e. Behavior-based motion models would enable complex, emergent patterns, such as probabilistic prediction of serpentine target motion. Once serpentine motion by the object is likely, a behavior-based hypothesis would introduce a second likely path for MHT analysis. Behavior based models would also enable probabilistic prediction of goal-oriented target motion (e.g., a terrorist heading for a barrier) and aid in the classification of objects. Given a priori goal map data, a behavior-based hypothesis would compute a goal-based trajectory for analysis. Behavior/Goal oriented motion can be used to determine a likely path that the object will take. Thus, there is less uncertainty of the person moving sideways, for example, and this has a positive benefit on deleting paths with low scores, following paths with higher scores, and in correlating the simulated robot-to-object and object-to-object tracks with sensed data to aid in the classification of objects; para. [0100]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Goulding. One would have been motivated to make this modification because making it refines classification accuracy.
15. Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Abbaszadeh et al. (U.S. Patent Application Pub. No. US 20200125978 A1).
Claim 17: Zhang teaches the method of claim 1. Zhang does not explicitly teach applied to detection of anomalies in IT systems, cyber physical systems and detection of cyber attacks, wherein the first set of Observable variables (Var1, Var2, Var3, ..., VarN) and the second set of Observable variables (VarA, VarB, VarC, ..., VarZ) correspond to readings from various IDS probes at different system levels and wherein the dependencies between said Observable variables, said class variable and each said Extension variable describe correlations between different components of an overall system, such that states of unobservable components can be predicted or anomalous states of components can be detected.
However, Abbaszadeh teaches applied to detection of anomalies in IT systems, cyber physical systems and detection of cyber attacks, wherein the first set of Observable variables (Var1, Var2, Var3, ..., VarN) and the second set of Observable variables (VarA, VarB, VarC, ..., VarZ) correspond to readings from various IDS probes at different system levels and wherein the dependencies between said Observable variables, said class variable and each said Extension variable describe correlations between different components of an overall system, such that states of unobservable components can be predicted or anomalous states of components can be detected (i.e. some embodiments may provide an advanced hybrid anomaly detection algorithm to detect cyber-attacks on, for example, key cyber-physical system control sensors. The algorithm may identify which signals(s) are being attacked using control signal-specific decision boundaries and may inform a cyber-physical system to take accommodative actions. In particular, a detection and localization algorithm might detect whether a sensor, auxiliary equipment input signal, control intermediary parameter, or control logical are in a normal or anomalous state. Some examples of cyber-physical system monitoring nodes that might be analyzed include: critical control sensors; control system intermediary parameters; auxiliary equipment input signals; and/or logical commands to controller; para. [0076]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Zhang to include the feature of Abbaszadeh. One would have been motivated to make this modification because making it improves cybersecurity threat detection.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Baker et al. (Pub. No. US 20210103807 A1), methods for performing inference on a generative model are provided. In one aspect, a method includes receiving a generative model in a probabilistic program form defining variables and probabilistic relationships between variables, and producing a neural network to model the behavior of the generative model.
Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAN TRAN whose telephone number is (303)297-4266. The examiner can normally be reached on Monday - Thursday - 8:00 am - 5:00 pm MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Ell can be reached on 571-270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TAN H TRAN/Primary Examiner, Art Unit 2141