Last updated: April 19, 2026
Application No. 17/514,698
METHOD AND SYSTEM FOR LEARNING REPRESENTATIONS LESS PRONE TO CATASTROPHIC FORGETTING

Final Rejection §102§103
Filed
Oct 29, 2021
Examiner
HONORE, EVEL NMN
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Naver Corporation
OA Round
2 (Final)
Interview Optional

— +46.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 18 resolved cases, 2023–2026
Examiner Intelligence

HONORE, EVEL NMN View full profile →
Grants only 39% of cases
Career Allow Rate
7 granted / 18 resolved
-16.1% vs TC avg
Strong +46% interview lift
Without
With
+46.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
38 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
42.6%
+2.6% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
6.6%
-33.4% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 18 resolved cases
Office Action

§102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the filing on 10/29/2021. Claims 1-5, 7-12, 21, 23 and 24 have been amended. Claims 1-24 are pending in this case. Claims 1, 21 and 23-24 are independent claims.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-14, 17-18 and 21-24 are rejected under 35 U.S.C 102 as being unpatentable over ZHANG et al. (Pub No.: 20220147818 A1), hereinafter referred to as ZHANG.
With respect to claim 1, ZHANG disclose:
A computer-implemented method for training a neural network model for sequentially learning a plurality of domains associated with a task, the computer-implemented method comprising (In paragraph [0059], ZHANG discloses a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model. This model contains multiple neural networks that contribute to the overall functionality. The auxiliary model can be complex, with one or more neural networks possibly containing additional subnetworks. )
determining at least one set of auxiliary model parameters by simulating, using a processor, at least one first optimization step based on a set of current model parameters of the neural network model and at least one auxiliary domain, wherein the at least one auxiliary domain is generated from a primary domain comprising one or more data points for training [[a]]the neural network model (In FIG. 7 and paragraphs [0060-0061], ZHANG discloses that the first neural network is set up to take a set of "first input vectors." These vectors serve as input, which can be a single vector or multiple vectors. Each input vector corresponds to a "data point," which can represent various entities like users, devices, or machines. The primary model utilizes these data points to make predictive analyses. Each input vector also contains a value for a "new feature" associated with the respective data point. This new feature can pertain to various scenarios, such as new tests or devices, and may or may not have an observed value. In paragraph [0064], ZHANG discloses that the auxiliary model 700 is trained to predict new parameters that give accurate values for new features. These new parameters are then given to the primary model, and the existing parameters of the primary model stay the same.)
determining a set of primary model parameters by performing, using the processor, a second optimization step for the neural network model based on: the set of current model parameters and the primary domain, the at least one set of auxiliary model parameters, and at least one of the primary domain and the at least one auxiliary domain; (In paragraphs [0062-0063], ZHANG discloses, a network takes a set of input vectors and transforms them into a context vector. This vector serves as an abstract representation that captures necessary information about the new features that have been integrated into the context of the primary model. The first neural network outputs the context vector, which is then inputted into the second neural network. The second neural network takes the context vector and is designed to generate a set of new parameters. In paragraph [0211], ZHANG disclose a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model, wherein the primary model is configured to transform from an observed subset of a set of real-world features to a predicted version of the set of real-world features.)
updating, using the processor, the neural network model with the set of primary model parameters, (In paragraph [0049], ZHANG discloses that the neural network is called a Bayesian neural network. For any weight or connection, the distribution can be shown using a set of samples or a set of numbers that define the distribution, like a pair of numbers that indicate its center point and width (such as the average μ and standard deviation σ). The learning or the weights may comprise fine tuning one or more of the parameters of each distribution.)
wherein the updated neural network model is used to perform one or more of a recognition task, a classification task, an autonomous movement task, or a natural language processing task (In paragraph [0052], ZHANG discloses identifying or recognizing patterns or objects, such as image recognition. Each element of the feature vector X may represent a respective pixel value.)
wherein the data points of the primary domain comprise one or more of text, voice, image, or sensory data (In paragraph [0073], ZHANG discloses that the primary model 901 may be used to control an apparatus or vehicle. For instance, the features may relate to sensor data of a vehicle such as an autonomous (i.e. driverless) car. The sensor data may provide values of, e.g. speed, direction, acceleration, braking force, etc. The primary model 901 may be used to predict potential collisions and thus take action to prevent such collisions.)

Regarding claim 2, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1 further comprising: generating, using the processor, the at least one auxiliary domain from the primary domain (In FIG. 9 and paragraph [0070], ZHANG discloses training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model, wherein the primary model is configured to transform from an observed subset of a set of real-world features to a predicted version of the set of real-world features.)
wherein the generating the at least one auxiliary domain from the primary domain comprises modifying, using the processor, the one or more data points of the primary domain via data manipulation (In paragraph [0070], ZHANG disclose  the auxiliary model 700 then generates the predicted set of model parameters. The new model parameters are then supplied to the primary model 901. The new model parameters may be supplied to the primary model 901 directly from the auxiliary model 700. The primary model 901 then uses the new model parameters to predict values of the new features.)
and wherein the at least one auxiliary domain comprises the one or more modified data points (In paragraph [0070], ZHANG discloses that the primary model 901 is extended using the new model parameters. The existing parameters of the primary model 901 are not changed. That is, they remain fixed during training of the auxiliary model 700.)

Regarding claim 3, ZHANG disclose the elements of claim 2. In addition, ZHANG disclose:
The computer-implemented method of wherein the data manipulation is performed automatically_[[by]]using the processor (In paragraph [0156], ZHANG discloses automated sequential decision-making when only a group of features is available for observation. This machine learning (ML) model 208′ can be used in place of a standard VAE in the apparatus 200 of FIG. 2, for example, in order to make predictions, perform imputations, and make decisions. The mode 208′ will be referred to below as a “sequential model” 208′.)

Regarding claim 4, ZHANG disclose the elements of claim 2. In addition, ZHANG disclose:
The computer-implemented method of wherein the generating the at least one auxiliary domain from the primary domain comprises selecting, using the processor, the one or more data points from the primary domain (In paragraph [0071], ZHANG discloses the primary model using the observed features to predict the condition of the patient. New data, such as results from a new medical test, can also be added to help with predictions.)

Regarding claim 5, ZHANG disclose the elements of claim 2. In addition, ZHANG disclose:
The computer-implemented method of wherein the modifying the one or more data points of the primary domain via data manipulation comprises automatically and/or randomly selecting, using the processor, one or more transformations from a set of transformations (In paragraph [0219], ZHANG discloses the training the auxiliary model may comprise: training the auxiliary model using training data comprising only a subset of the set of real-world features; randomly sampling data points having respective observed values for the subset of real-world features.)
and wherein each auxiliary domain of the at least one auxiliary domain is defined by one or more respective transformations of the set of transformations (In paragraph [0219], ZHANG discloses wherein the respective observed values for the remaining data points are hidden from the auxiliary model; and trains the auxiliary model to use the predicted set of new parameters to predict the respective observed values for the remaining data points.)

Regarding claim 6, ZHANG disclose the elements of claim 2. In addition, ZHANG disclose:
The computer-implemented method of claim 2, wherein the data manipulation comprises at least one image transformation (In paragraph [0054], ZHANG discloses a soft value representing a probability or confidence that the image comprises an image of an elephant.)
and wherein the at least one image transformation comprises at least one of a photometric and a geometric transformation (In paragraph [0054], ZHANG discloses an image representing a single binary value.)

Regarding claim 7, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of wherein the second optimization step employs a regularizer having a first objective of avoiding catastrophic forgetting and a second objective of encouraging domain adaptation (In paragraph [0102], ZHANG discloses preventing catastrophic forgetting. The hypernetwork aims to learn the task embedding for each task as a form of data compression, by training on all the data for the new task—by contrast, CHNs predict parameters associated with a new feature conditioned on the data associated with this feature, with no training required.)

Regarding claim 8, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of wherein the second optimization step employs a loss function having terms associated with task learning, avoiding catastrophic forgetting, and encouraging domain adaptation (In paragraph [0220], ZHANG discloses the new predicted parameters that are then used to predict the hidden values. The model is trained, e.g. using a loss function, to update the predicted parameters so that the predicted values match the actual values.)

Regarding claim 9, ZHANG disclose the elements of claim 8. In addition, ZHANG disclose:
The computer-implemented method of The computer-implemented method of wherein the loss function is used for optimization of the neural network model via gradient descent (In paragraph [0148], ZHANG discloses that the minimization may be performed using an optimization function such as an ELBO (evidence lower bound) function, which uses cost function minimization based on gradient descent.)

Regarding claim 10, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of The computer-implemented method of wherein a loss function associated with the second optimization step comprises:(i) a first loss function associated with the set of current model parameters and the primary domain and one or more of:(ii) a second loss function associated with the at least one set of auxiliary model parameters and the primary domain, or (iii) a third loss function associated with the at least one set of auxiliary model parameters and the at least one auxiliary domain (Examiner selects: (i) In paragraph [0220], ZHANG discloses that the model is trained, e.g. using a loss function, to update the predicted parameters so that the predicted values match the actual values. )

Regarding claim 11, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, further comprising: initializing, using the processor, the neural network model, wherein initializing the neural network model comprises setting model parameters of a pre-trained neural network model as initial model parameters for the neural network model to fine- tune the pre-trained neural network model (In paragraph [0140], ZHANG discloses fine-tuning by performing the same experiment but instead of initializing the heads with the CHN parameters.)
wherein said fine-tuning the pre-trained neural network model comprises performing the second optimization (In paragraph [0141], ZHANG discloses that training by gradient descent leads to a decrease in performance due to over-fitting, suggesting that the CHN has an implicit regularizing effect on the parameter initialization. We note also that in all cases, once training has converged, the parameters trained from the CHN initialization outperform those trained from the random initialization for all values of k.)

Regarding claim 12, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, further comprising: selecting, using the processor, a first sample or a first batch of samples from the auxiliary domain for the determining at least one set of auxiliary model parameters and selecting, using the processor, a second sample or a second batch of samples from the primary domain and at least one of: selecting, using the processor, a third sample or a third batch of samples from the primary domains and selecting, using the processor, a fourth sample or a fourth batch of samples from the at least one auxiliary domain for the determining a set of primary model parameters (Examiner selects: a first sample or a first batch of samples from the auxiliary domain for the determining at least one set of auxiliary model parameters. In paragraph [0049], ZHANG discloses that the distribution may be modeled in terms of a set of samples of the distribution, or a set of parameters parameterizing the respective distribution, e.g. a pair of parameters specifying its center point and width (e.g. in terms of its mean μ and standard deviation σ or variance σ.sup.2). The value of the edge or weight may be a random sample from the distribution.)

Regarding claim 13, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of wherein a set of auxiliary model parameters of the at least one set of auxiliary model parameters minimizes a respective loss associated with a respective auxiliary domain of the at least one auxiliary domain with respect to the set of current model parameters (In paragraph [0147], ZHANG discloses minimizing a measure of divergence between q.sub.ø(Z.sub.i|X.sub.i) and p.sub.θ(X.sub.i|Z.sub.i), where q.sub.ø(Z.sub.i|X.sub.i) is a function parameterized by ø representing a vector of the probabilistic distributions of the elements of Zi output by the encoder 208q given the input values of Xi, whilst p.sub.θ(X.sub.i|Z.sub.i) is a function parameterized by θ representing a vector of the probabilistic distributions of the elements of X.sub. I output by the encoder 208q given Z.sub.i.)

Regarding claim 14, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of wherein the set of primary model parameters minimizes a loss associated with the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain with respect to the current model parameters (In paragraph [0148], ZHANG discloses that the minimization may be performed using an optimization function such as an ELBO (evidence lower bound) function, which uses cost function minimization based on gradient descent. An ELBO function may be referred to herein by way of example, but this is not limiting and other metrics and functions are also known in the art for tuning the encoder and decoder networks of a VAE.)

Regarding claim 17, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein the one or more data points of the primary domain include or are divided into a first set of data points for training the neural network model, a second set of data points for validating the neural network model and a third set of data points for testing the neural network model (In paragraph [0081], ZHANG discloses dividing the model parameters into parameters θ.sub.0 inherent from the old model, and feature-specific parameters θ.sub.n associated solely with the new feature.)

Regarding claim 18, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of wherein the neural network model is trained on the one or more data points of the primary domain being a first primary domain in a first step, and wherein the trained neural network model is subsequently trained on data points of a second primary domain in a second step without accessing data points of the first primary domain in the second step (In paragraph [0062-0063], ZHANG discloses the first neural network. The first neural network 701 is configured to transform from the set of first input vectors to a single context vector. The context vector encodes a representation of the values of the new features in the context of the primary model. The second neural network 702 is configured to transform from the context vector to a predicted set of new parameters for predicting values of the new feature. The new parameters may, in general, be parameters for use in any of the layers of the primary model.)

Regarding claim 20, ZHANG disclose the elements of claim 1. In addition, ZHANG disclose:
A neural network trained in accordance with the method of claim 18 to perform the task in the first primary domain and the second primary domain (In paragraph [0062-0063], ZHANG discloses the first neural network. The first neural network 701 is configured to transform from the set of first input vectors to a single context vector. The context vector encodes a representation of the values of the new features in the context of the primary model. The second neural network 702 is configured to transform from the context vector to a predicted set of new parameters for predicting values of the new feature. The new parameters may, in general, be parameters for use in any of the layers of the primary model.)

With respect to claim 21, ZHANG disclose:
A method for performing a task in at least a first primary domain, the method comprising: performing, by a neural network model implemented by a processor and trained on the first primary domain, the task in the first primary domain (In paragraph [0059], ZHANG disclose an example auxiliary model. In general the auxiliary model comprises a first neural network and a second neural network. The auxiliary model may also comprise a third neural network as shown in FIG. 8. Note that one or more of the neural networks may themselves comprise more than one neural network and/or other functions. For instance, in some examples the first neural network may comprise two sub-networks.)
and performing, by the trained neural network model trained by the processor on the first primary domain and fine-tuned by the processor on[[to]] a second primary domain, the task in the first primary domain or the second primary domain; wherein the neural network model is fine-tuned by (In paragraph [0059], ZHANG discloses a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model. This model contains multiple neural networks that contribute to the overall functionality. The auxiliary model can be complex, with one or more neural networks possibly containing additional subnetworks. )
determining at least one set of auxiliary model parameters by simulating1 using the processor, at least one first optimization based on a set of current model parameters of the neural network model and at least one auxiliary domain, wherein the at least one auxiliary domain is generated from the second primary domain, wherein the second primary domain comprises one or more data points for training the neural network model (In FIG. 7 and paragraphs [0062-0063], ZHANG discloses that the second neural network is set up to take a set of "second input vectors." These vectors serve as input, which can be a single vector or multiple vectors. Each input vector corresponds to a "data point," which can represent various entities like users, devices, or machines. The primary model utilizes these data points to make predictive analyses. Each input vector also contains a value for a "new feature" associated with the respective data point. This new feature can pertain to various scenarios, such as new tests or devices, and may or may not have an observed value. In paragraph [0064], ZHANG discloses that the auxiliary model 700 is trained to predict new parameters that give accurate values for new features. These new parameters are then given to the primary model, and the existing parameters of the primary model stay the same.)
determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the second primary domain and based on the at least one set of auxiliary model parameters and at least one of the second primary domain and the at least one auxiliary domain (In paragraphs [0062-0063], ZHANG discloses, a network takes a set of input vectors and transforms them into a context vector. This vector serves as an abstract representation that captures necessary information about the new features that have been integrated into the context of the primary model. The first neural network outputs the context vector, which is then inputted into the second neural network. The second neural network takes the context vector and is designed to generate a set of new parameters. In paragraph [0211], ZHANG disclose a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model, wherein the primary model is configured to transform from an observed subset of a set of real-world features to a predicted version of the set of real-world features.)
updating the neural network model with the set of primary model parameters (In paragraph [0049], ZHANG discloses that the neural network is called a Bayesian neural network. For any weight or connection, the distribution can be shown using a set of samples or a set of numbers that define the distribution, like a pair of numbers that indicate its center point and width (such as the average μ and standard deviation σ). The learning or the weights may comprise fine tuning one or more of the parameters of each distribution.)
wherein the updated neural network model is used to perform one or more of a recognition task, a classification task, an autonomous movement task, or a natural language processing task (In paragraph [0052], ZHANG discloses identifying or recognizing patterns or objects, such as image recognition. Each element of the feature vector X may represent a respective pixel value.)
wherein the data points of the primary domain comprise one or more of text, voice, image, or sensory data (In paragraph [0073], ZHANG discloses that the primary model 901 may be used to control an apparatus or vehicle. For instance, the features may relate to sensor data of a vehicle such as an autonomous (i.e. driverless) car. The sensor data may provide values of, e.g. speed, direction, acceleration, braking force, etc. The primary model 901 may be used to predict potential collisions and thus take action to prevent such collisions.)

Regarding claim 22, ZHANG disclose the elements of claim 21. In addition, ZHANG disclose:
The method of claim 21, wherein the neural network model is fine- tuned to perform the task in the second primary domain without accessing data points of the first primary domain (In paragraph [0062-0063], ZHANG discloses the first neural network. The first neural network 701 is configured to transform from the set of first input vectors to a single context vector. The context vector encodes a representation of the values of the new features in the context of the primary model. The second neural network 702 is configured to transform from the context vector to a predicted set of new parameters for predicting values of the new feature. The new parameters may, in general, be parameters for use in any of the layers of the primary model.)

With respect to claim 23, ZHANG disclose:
An apparatus for training a neural network model comprising: a non-transitory computer-readable medium having executable instructions stored thereon for causing a processor and a memory to perform a method comprising (In paragraph [0059], ZHANG discloses a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model. This model contains multiple neural networks that contribute to the overall functionality. The auxiliary model can be complex, with one or more neural networks possibly containing additional subnetworks. )
determining at least one set of auxiliary model parameters by simulating, using a processor, at least one first optimization step based on a set of current model parameters of the neural network model and at least one auxiliary domain, wherein the at least one auxiliary domain is generated from a primary domain comprising one or more data points for training [[a]]the neural network model (In FIG. 7 and paragraphs [0060-0061], ZHANG discloses that the first neural network is set up to take a set of "first input vectors." These vectors serve as input, which can be a single vector or multiple vectors. Each input vector corresponds to a "data point," which can represent various entities like users, devices, or machines. The primary model utilizes these data points to make predictive analyses. Each input vector also contains a value for a "new feature" associated with the respective data point. This new feature can pertain to various scenarios, such as new tests or devices, and may or may not have an observed value. In paragraph [0064], ZHANG discloses that the auxiliary model 700 is trained to predict new parameters that give accurate values for new features. These new parameters are then given to the primary model, and the existing parameters of the primary model stay the same.)
determining a set of primary model parameters by performing, using the processor, a second optimization step for the neural network model based on: the set of current model parameters and the primary domain, the at least one set of auxiliary model parameters, and at least one of the primary domain and the at least one auxiliary domain; (In paragraphs [0062-0063], ZHANG discloses, a network takes a set of input vectors and transforms them into a context vector. This vector serves as an abstract representation that captures necessary information about the new features that have been integrated into the context of the primary model. The first neural network outputs the context vector, which is then inputted into the second neural network. The second neural network takes the context vector and is designed to generate a set of new parameters. In paragraph [0211], ZHANG disclose a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model, wherein the primary model is configured to transform from an observed subset of a set of real-world features to a predicted version of the set of real-world features.)
updating, using the processor, the neural network model with the set of primary model parameters, (In paragraph [0049], ZHANG discloses that the neural network is called a Bayesian neural network. For any weight or connection, the distribution can be shown using a set of samples or a set of numbers that define the distribution, like a pair of numbers that indicate its center point and width (such as the average μ and standard deviation σ). The learning or the weights may comprise fine tuning one or more of the parameters of each distribution.)
wherein the updated neural network model is used to perform one or more of a recognition task, a classification task, an autonomous movement task, or a natural language processing task (In paragraph [0052], ZHANG discloses identifying or recognizing patterns or objects, such as image recognition. Each element of the feature vector X may represent a respective pixel value.)
wherein the data points of the primary domain comprise one or more of text, voice, image, or sensory data (In paragraph [0073], ZHANG discloses that the primary model 901 may be used to control an apparatus or vehicle. For instance, the features may relate to sensor data of a vehicle such as an autonomous (i.e. driverless) car. The sensor data may provide values of, e.g. speed, direction, acceleration, braking force, etc. The primary model 901 may be used to predict potential collisions and thus take action to prevent such collisions.)

With respect to claim 24, ZHANG disclose:
A system for training a neural network model comprising: a processor; a memory; and computer-executable instructions stored on a non-transitory computer-readable medium for causing the processor to perform a method comprising (In paragraph [0042], ZHANG disclose a computer readable storage and run on processing apparatus comprising one or more processors such as CPUs. The storage on which the code is stored may comprise one or more memory devices employing one or more memory media. In paragraph [0059], ZHANG discloses a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model. This model contains multiple neural networks that contribute to the overall functionality. The auxiliary model can be complex, with one or more neural networks possibly containing additional subnetworks. )
determining at least one set of auxiliary model parameters by simulating, using a processor, at least one first optimization step based on a set of current model parameters of the neural network model and at least one auxiliary domain, wherein the at least one auxiliary domain is generated from a primary domain comprising one or more data points for training [[a]]the neural network model (In FIG. 7 and paragraphs [0060-0061], ZHANG discloses that the first neural network is set up to take a set of "first input vectors." These vectors serve as input, which can be a single vector or multiple vectors. Each input vector corresponds to a "data point," which can represent various entities like users, devices, or machines. The primary model utilizes these data points to make predictive analyses. Each input vector also contains a value for a "new feature" associated with the respective data point. This new feature can pertain to various scenarios, such as new tests or devices, and may or may not have an observed value. In paragraph [0064], ZHANG discloses that the auxiliary model 700 is trained to predict new parameters that give accurate values for new features. These new parameters are then given to the primary model, and the existing parameters of the primary model stay the same.)
determining a set of primary model parameters by performing, using the processor, a second optimization step for the neural network model based on: the set of current model parameters and the primary domain, the at least one set of auxiliary model parameters, and at least one of the primary domain and the at least one auxiliary domain; (In paragraphs [0062-0063], ZHANG discloses, a network takes a set of input vectors and transforms them into a context vector. This vector serves as an abstract representation that captures necessary information about the new features that have been integrated into the context of the primary model. The first neural network outputs the context vector, which is then inputted into the second neural network. The second neural network takes the context vector and is designed to generate a set of new parameters. In paragraph [0211], ZHANG disclose a computer-implemented method of training an auxiliary machine learning model to predict a set of new parameters of a primary machine learning model, wherein the primary model is configured to transform from an observed subset of a set of real-world features to a predicted version of the set of real-world features.)
updating, using the processor, the neural network model with the set of primary model parameters, (In paragraph [0049], ZHANG discloses that the neural network is called a Bayesian neural network. For any weight or connection, the distribution can be shown using a set of samples or a set of numbers that define the distribution, like a pair of numbers that indicate its center point and width (such as the average μ and standard deviation σ). The learning or the weights may comprise fine tuning one or more of the parameters of each distribution.)
wherein the updated neural network model is used to perform one or more of a recognition task, a classification task, an autonomous movement task, or a natural language processing task (In paragraph [0052], ZHANG discloses identifying or recognizing patterns or objects, such as image recognition. Each element of the feature vector X may represent a respective pixel value.)
wherein the data points of the primary domain comprise one or more of text, voice, image, or sensory data (In paragraph [0073], ZHANG discloses that the primary model 901 may be used to control an apparatus or vehicle. For instance, the features may relate to sensor data of a vehicle such as an autonomous (i.e. driverless) car. The sensor data may provide values of, e.g. speed, direction, acceleration, braking force, etc. The primary model 901 may be used to predict potential collisions and thus take action to prevent such collisions.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 15-16 are rejected under 35 U.S.C 103 as being unpatentable over ZHANG in view of Olabiyi et al. (Pub No.: 20200265296 A1), hereinafter referred to as Olabiyi.
Regarding claim 15, ZHANG disclose the elements of claim 1. ZHANG does not appear to explicitly disclose:
The computer-implemented method of claim 1, wherein the steps of determining at least one set of auxiliary model parameters, determining a set of primary model parameters, and updating the neural network model are repeated until at least one of a gradient descent step size for the second optimization is below a threshold and a maximum number of gradient descent steps is reached 
However, Olabiyi disclose the limitation (In paragraph [0055], Olabiyi discloses that the mini-batch accuracy of the current model is less than the accuracy threshold and if the number of boosting iterations is less than the burn-in iteration threshold, the system may proceed to train the model using a first loss function such as the maximum likelihood objective of equation (3) (step 320, branch Y). If either the accuracy threshold or the burn-in iteration threshold are exceeded, the system may proceed to train the model using a second, weighted loss function such as Deep Boost equation (4) (step 320, branch N).)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG to include Olabiyi’s, with model training process may be configured with a tunable accuracy threshold and/or iterations threshold. The motivation for doing so would have been to improve performance of the machine, e.g. by predicting which settings of the machine to alter (See [0072] of ZHANG.)

Regarding claim 16, ZHANG disclose the elements of claim 1. ZHANG does not appear to explicitly disclose:
The computer-implemented method of wherein at least one of the at least one first optimization step comprises at least one gradient descent step and the second optimization step comprises a gradient descent step
However, Olabiyi disclose the limitation (In paragraph [0062], Olabiyi discloses that computing the average gradient of the mini-batch loss may allow the system to minimize the weighted negative log-likelihood loss of the input-output of the mini-batch examples through stochastic gradient descent methods.)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG to include Olabiyi’s, with model training process may be configured with a tunable accuracy threshold and/or iterations threshold. The motivation for doing so would have been to improve performance of the machine, e.g. by predicting which settings of the machine to alter (See [0072] of ZHANG.)

Claim 19 is rejected under 35 U.S.C 103 as being unpatentable over ZHANG in view of CAO et al. (Pub No.: 20200265296 A1), hereinafter referred to as CAO.
Regarding claim 19, ZHANG disclose the elements of claim 18. ZHANG does not appear to explicitly disclose:
The computer-implemented method of The computer-implemented method of wherein the neural network model is trained by empirical risk minimization (ERM)
However, CAO disclose the limitation (In paragraph [0079], CAO discloses that the neural network model is trained by empirical risk minimization (ERM).) 
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG to include CAO’s, with a process model of a patterning process configured to predict a pattern on a substrate. The motivation for doing so would have been to improve the prediction of the coupled models (See [0100] of CAO.)

Response to Arguments
The applicant's arguments filed 09/22/2025 have been fully considered, but in part are not persuasive.
Pertaining to Rejection under 101
Rejections for claims 1-24 are withdrawn under 35 USC § 101.

Pertaining to Rejection under 103
Applicant’s arguments in regard to the examiner’s rejections under 35 USC 103 are moot in view of the new grounds of rejection
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVEL HONORE whose telephone number is (703)756-1179. The examiner can normally be reached Monday-Friday 8 a.m. -5:30 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

EVEL HONORE
Examiner
Art Unit 2142



/HAIMEI JIANG/Primary Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

Oct 29, 2021
Application Filed
May 15, 2025
Non-Final Rejection — §102, §103
Sep 22, 2025
Response Filed
Dec 17, 2025
Final Rejection — §102, §103
Mar 16, 2026
Interview Requested
Apr 15, 2026
Applicant Interview (Telephonic)
Apr 15, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/399,470
Patent 12566942
System and Method For Generating Parametric Activation Functions
2y 5m to grant Granted Mar 03, 2026
17/484,623
Patent 12547946
SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA
2y 5m to grant Granted Feb 10, 2026
17/687,918
Patent 12547906
METHOD, DEVICE, AND PROGRAM PRODUCT FOR TRAINING MODEL
2y 5m to grant Granted Feb 10, 2026
17/189,160
Patent 12536156
UPDATING METADATA ASSOCIATED WITH HISTORIC DATA
2y 5m to grant Granted Jan 27, 2026
17/331,332
Patent 12406483
ONLINE CLASS-INCREMENTAL CONTINUAL LEARNING WITH ADVERSARIAL SHAPLEY VALUE
2y 5m to grant Granted Sep 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
39%
Grant Probability
85%
With Interview (+46.4%)
4y 5m
Median Time to Grant
Moderate
PTA Risk
Based on 18 resolved cases by this examiner. Grant probability derived from career allow rate.