Last updated: May 29, 2026
Application No. 17/649,472
PARAMETER AND STATE INITIALIZATION FOR MODEL TRAINING

Non-Final OA §103
Filed
Jan 31, 2022
Examiner
BASOM, BLAINE T
Art Unit
2141
Tech Center
2100 — Computer Architecture & Software
Assignee
X Development LLC
OA Round
3 (Non-Final)
Interview Optional

— +22.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 43% grant rate with +22.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 326 resolved cases, 2023–2026
Examiner Intelligence

BASOM, BLAINE T View full profile →
Grants 43% of resolved cases
Career Allowance Rate
140 granted / 326 resolved
-12.1% vs TC avg
Strong +23% interview lift
Without
With
+22.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
23 currently pending
Career history
364
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
85.8%
+45.8% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
2.6%
-37.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 326 resolved cases
Office Action

§103
DETAILED ACTION
This Office action is responsive to the Request for Continued Examination (RCE) filed under 37 CFR §1.53(d) for the instant application on February 9, 2026.  The Applicants have properly set forth the RCE, which has been entered into the application, and an examination on the merits follows herewith.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on November 11, 2025 has been considered by the Examiner.

Examiner’s Note
	The Examiner respectfully notes that the Applicant’s amendments fail to comply with 37 CFR 1.121.  Claim 21 in particular indicates a status of “Currently Amended” and comprises demarcations (i.e. underlining of “experimental”) indicating amendments to the claim.  However, claim 21 has not actually been amended relative to its prior version; claim 21 is the same as previously filed on July 9, 2025.  For purposes of compact prosecution, the claims have nevertheless been examined.  The Applicant however is respectfully reminded to comply with 37 CFR 1.121 in future submissions in order to avoid a notice of non-compliance.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3, 4, 6-9, 11, 12, 14-16, 18, 19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over the article entitled “K for the price of 1: Parameter-efficient multi-task and transfer learning” by Mudrarkarta et al. (“Mudrarkarta”), over the article entitled “Drug susceptibility prediction against a panel of drugs using kernelized Bayesian multitask learning” by Gönen et al. (“Gönen”), and also over U.S. Patent Application Publication No. 2020/0027564 to Lefebvre et al. (“Lefebvre”).
Regarding claims 1, 9 and 16, Mudrarkarta generally describes a “method that enables parameter-efficient transfer and multi-task learning with deep neural networks.”  (Abstract).  Particularly, similar to the claimed invention, Mudrarkarta teaches:
	defining a set of tasks to be performed via execution of a machine-learning model (As noted above, Mudrarkarta generally provides a method that enables parameter-efficient multi-task learning with deep neural networks.  Such multi-task learning would understandably necessitate defining multiple tasks to be performed via execution of a machine-learning model.  For example, Mudrarkarta teaches that multi-task learning can be applied to simultaneously train two models using different datasets, understandably to perform different image-recognition tasks:

Multi-task learning We explore a multi-task learning paradigm wherein multiple models that share most of the parameters are trained simultaneously (see Figure 1, right side). Each model has a task-specific model patch. Training is done in a distributed manner; each task is assigned a subset of available workers that send independent gradient updates to both shared and task-specific parameters using standard optimization algorithms. Our results show that simultaneously training two such MobilenetV2 (Sandler et al., 2018) models on ImageNet (Deng et al., 2009) and Places365 (Zhou et al., 2017) reach accuracies comparable to, and sometimes higher than individually trained models.
(Page 2; emphasis added).

We evaluate the performance of our method in both transfer and multi-task learning using the image recognition networks MobilenetV2 (Sandler et al., 2018) and InceptionV3 (Szegedy et al., 2016) and a variety of datasets: ImageNet (Deng et al., 2009), CIFAR-10/100 (Krizhevsky, 2009), Cars (Krause et al., 2013), Aircraft (Maji et al., 2013), Flowers-102 (Nilsback & Zisserman, 2008) and Places-365 (Zhou et al., 2017). An overview of these datasets can be found in Table 1. We also show preliminary results on transfer learning across completely different types of tasks using MobilenetV2 and Single-Shot Multibox Detector (SSD) (Liu et al., 2016) networks.
(Pages 5-6; emphasis added).

    PNG
    media_image1.png
    108
    451
    media_image1.png
    Greyscale

(Page 6; emphasis added).

In this section we show that, when using model-specific patches during multi-task training, it leads to performance comparable to that of independently trained models, while essentially using a single model.

We simultaneously train MobilenetV2 (Sandler et al., 2018) on two large datasets: ImageNet and Places365. Although the network architecture is the same for both datasets, each model has its own private patch that, along with the rest of the model weights constitutes the model for that dataset. We choose a combination of the scale-and-bias patch, and the last layer as the private model patch in this experiment. The rest of the weights are shared and receive gradient updates from all tasks.
(Page 7; emphasis added).

The plurality of simultaneously-trained models can be considered a machine-learning model, or alternatively the model architecture, e.g. “MobilenetV2” above, trained for each task can be considered a machine-learning model.  Accordingly, Mudrarkarta teaches defining a set of tasks, e.g. image-recognition tasks, to be performed via execution of a machine-learning model.);
identifying, for each task of the set of tasks, a set of learnable task-specific parameters to configure a model architecture used for the task (Mudrarkarta discloses that the model for each task comprises the task’s “model patch,” i.e. a small set of parameters, along with a shared set of parameters:

Our contribution is a novel learning paradigm in which each task carries its own model patch – a small set of parameters – that, along with a shared set of parameters constitutes the model for that task (for a visual description of the idea, see Figure 1, left side). We put this idea to use in two scenarios: a) in transfer learning, by fine-tuning only the model patch for new tasks, and b) in multi-task learning, where each task performs gradient updates to both its own model patch, and the shared parameters. In our experiments (Section 5), the largest patch that we used is smaller than 10% of the size of the entire model. We now describe our contribution in detail.

(Pages 1-2; emphasis added).

    PNG
    media_image2.png
    211
    418
    media_image2.png
    Greyscale

(From Figure 1 on page 2).

The central concept in our method is that of a model patch. It is essentially a small set of per-channel transformations that are dispersed throughout the network resulting in only a tiny increase in the number of model parameters.

Suppose a deep network                         
                            M
                        
                     is a sequence of layers represented by their parameters (weights, biases),                         
                            
                                    W
                                
                                    1
                                
                            ,
                            …
                            ,
                             
                                    W
                                
                                    n
                                
                    .  We ignore non-trainable layers (e.g., some kinds of activations) in this formulation. A model patch P is a set of parameters                         
                            
                                    W
                                
                                            i
                                        
                                            1
                                        
                                    '
                                
                            ,
                             
                            …
                            ,
                             
                                    W
                                
                                            i
                                        
                                            k
                                        
                                    '
                                
                            ,
                             
                            1
                             
                            ≤
                             
                                    i
                                
                                    1
                                
                            ,
                             
                            …
                            ,
                             
                                    i
                                
                                    k
                                
                            ≤
                            n
                        
                     that, when applied to                         
                            M
                        
                    , adds layers at positions                         
                            
                                    i
                                
                                    1
                                
                            ,
                             
                            …
                            ,
                             
                                    i
                                
                                    n
                                
                    .  Thus, a patched model 

                            M
                        
                            '
                        
                    =
                     
                            W
                        
                            1
                        
                    ,
                     
                    …
                    ,
                     
                            W
                        
                                    i
                                
                                    1
                                
                    ,
                     
                            W
                        
                                    i
                                
                                    1
                                
                            '
                        
                    ,
                     
                    …
                    ,
                     
                            W
                        
                                    i
                                
                                    n
                                
                    ,
                     
                            W
                        
                                    i
                                
                                    n
                                
                            '
                        
                    ,
                     
                    …
                    ,
                     
                            W
                        
                            n
                        
(Page 2; emphasis added).

Multitask learning We aim to simultaneously, but independently, train multiple neural networks that share most weights. Unlike in transfer learning, where a large fraction of the weights are kept frozen, here we learn all the weights. However, each task carries its own model patch, and trains a patched model. By training all the parameters, this setting offers more adaptability to tasks while not compromising on the total number of parameters. 

To implement multi-task learning, we use the distributed TensorFlow paradigm: a central parameter server receives gradient updates from each of the workers and updates the weights. Each worker reads the input, computes the loss and sends gradients to the parameter server. We allow subsets of workers to train different tasks; workers thus may have different computational graphs, and task-specific input pipelines and loss functions. A visual depiction of this setting is shown in Figure 1.

(Page 3; emphasis added).

In this section we show that, when using model-specific patches during multi-task training, it leads to performance comparable to that of independently trained models, while essentially using a single model.

We simultaneously train MobilenetV2 (Sandler et al., 2018) on two large datasets: ImageNet and Places365. Although the network architecture is the same for both datasets, each model has its own private patch that, along with the rest of the model weights constitutes the model for that dataset. We choose a combination of the scale-and-bias patch, and the last layer as the private model patch in this experiment. The rest of the weights are shared and receive gradient updates from all tasks.

(Page 7; emphasis added).

The combination of the shared parameters and the patch parameters for each task is considered a set of learnable task-specific parameters to configure a model architecture used for the task.);
	stipulating that a first learnable task-specific parameter associated with a first task of the set of tasks is a shared or global parameter that is to have a same value as at least one other learnable task-specific parameter, wherein each of the at least one other learnable task-specific parameter is associated with a corresponding other task of the set of tasks (As noted above, Mudrarkarta discloses that the model for each task comprises the task’s “model patch,” i.e. a small set of parameters, along with a shared set of parameters.  Each of the parameters, including the shared parameters, for each task are learnable through multi-task training:

Multi-task learning We explore a multi-task learning paradigm wherein multiple models that share most of the parameters are trained simultaneously (see Figure 1, right side). Each model has a task-specific model patch. Training is done in a distributed manner; each task is assigned a subset of available workers that send independent gradient updates to both shared and task-specific parameters using standard optimization algorithms. Our results show that simultaneously training two such MobilenetV2 (Sandler et al., 2018) models on ImageNet (Deng et al., 2009) and Places365 (Zhou et al., 2017) reach accuracies comparable to, and sometimes higher than individually trained models.

(Page 2; emphasis added).

    PNG
    media_image3.png
    190
    462
    media_image3.png
    Greyscale

(From Figure 1 on page 2).

Multitask learning We aim to simultaneously, but independently, train multiple neural networks that share most weights. Unlike in transfer learning, where a large fraction of the weights are kept frozen, here we learn all the weights. However, each task carries its own model patch, and trains a patched model. By training all the parameters, this setting offers more adaptability to tasks while not compromising on the total number of parameters. 

To implement multi-task learning, we use the distributed TensorFlow paradigm: a central parameter server receives gradient updates from each of the workers and updates the weights. Each worker reads the input, computes the loss and sends gradients to the parameter server. We allow subsets of workers to train different tasks; workers thus may have different computational graphs, and task-specific input pipelines and loss functions. A visual depiction of this setting is shown in Figure 1.

(Page 3; emphasis added).

Such a shared parameter for a first task is considered a “first learnable task-specific parameter” like claimed, which is stipulated as a shared or global parameter that is to have a same value as at least one other learnable task-specific parameter, i.e. the same value as the same shared parameter for another task, wherein each of the at least one other learnable task-specific parameter is associated with a corresponding other task of the set of tasks.);
	configuring one or more parameter data structures with parameter values for the sets of task-specific parameters, wherein the configuration imposes a constraint that a value for the first task-specific parameter and the at least one value for the at least one other task-specific parameter are the same as each other (As noted above, Mudrarkarta discloses that the model for each task comprises the task’s “model patch,” i.e. a small set of parameters, along with a shared set of parameters, wherein each of the parameters for each task are learnable through multi-task training.  Mudrarkarta further teaches that such multi-task training entails assigning each task to a subset of workers, which read input, compute a loss, and send gradients to a parameter server; the parameter server receives gradient updates from each of the workers and updates the parameters (i.e. weights):

Multi-task learning We explore a multi-task learning paradigm wherein multiple models that share most of the parameters are trained simultaneously (see Figure 1, right side). Each model has a task-specific model patch. Training is done in a distributed manner; each task is assigned a subset of available workers that send independent gradient updates to both shared and task-specific parameters using standard optimization algorithms. Our results show that simultaneously training two such MobilenetV2 (Sandler et al., 2018) models on ImageNet (Deng et al., 2009) and Places365 (Zhou et al., 2017) reach accuracies comparable to, and sometimes higher than individually trained models.

(Page 2; emphasis added).

    PNG
    media_image3.png
    190
    462
    media_image3.png
    Greyscale

(From Figure 1 on page 2).

Multitask learning We aim to simultaneously, but independently, train multiple neural networks that share most weights. Unlike in transfer learning, where a large fraction of the weights are kept frozen, here we learn all the weights. However, each task carries its own model patch, and trains a patched model. By training all the parameters, this setting offers more adaptability to tasks while not compromising on the total number of parameters. 

To implement multi-task learning, we use the distributed TensorFlow paradigm: a central parameter server receives gradient updates from each of the workers and updates the weights. Each worker reads the input, computes the loss and sends gradients to the parameter server. We allow subsets of workers to train different tasks; workers thus may have different computational graphs, and task-specific input pipelines and loss functions. A visual depiction of this setting is shown in Figure 1.

(Page 3; emphasis added).

Such training would necessitate maintaining, i.e. in one or more data structures at the server and/or workers, parameter values for the parameters for the different tasks, wherein maintaining the parameter values would necessitate imposing a constraint that values of shared parameters for the different tasks are the same.  Accordingly, Mudrarkarta is further considered to teach configuring one or more parameter data structures with parameter values for the sets of task-specific parameters for the sets of tasks, wherein the configuration imposes a constraint that a value for the first task-specific parameter and the at least one value for the at least one other task-specific parameter, i.e. a value of the same shared parameter for each task, are the same as each other.);
training the machine-learning model using the configured one or more parameter data structures (As noted above, Mudrarkarta teaches that the multi-task training entails assigning each task to a subset of workers, which read input, compute a loss, and send gradients to a parameter server; the parameter server receives gradient updates from each of the workers and updates the parameters.  As further noted above, such training would necessitate maintaining the parameters via one or more configured parameter data structures like claimed.  Accordingly, it is apparent that the machine-learning model, e.g. the machine-learning architecture for performing the tasks, is trained using the configured one or more parameter data structures.); and
executing the trained machine-learning model by processing a non-training data set (As noted above, Mudrarkarta teaches that the multi-task training entails assigning each task to a subset of workers, which read input, compute a loss, and send gradients to a parameter server; the parameter server receives gradient updates from each of the workers and updates the parameters.  It is apparent that, once trained, the machine-learning model can be executed to process a non-training data set.).
Accordingly, Mudrarkarta teaches a method similar to that of claim 1.  Mudrarkarta discloses that such teachings can be implemented via a system comprising one or more data processors executing computer-readable instructions (see e.g. page 6, which recites “[w]e use TensorFlow (Abadi et al., 2015), and NVIDIA P100 and V100 GPUs for our experiments.”).  Such a system comprising one or more data processors and a non-transitory computer readable medium necessary to store the computer-readable instructions to implement the above-described teachings of Mudrarkarta is considered a system similar to that of claim 9.  The non-transitory computer readable medium comprising computer-readable instructions to implement the above-described teachings of Mudrarkarta is considered a computer-program product similar to that of claim 16.  Mudrarkarta, however, does not explicitly disclose that the set of tasks to be performed via execution of the machine-learning model is particularly a set of experimental conditions for a biological cell to be simulated via execution of the machine-learning model, wherein each experimental condition for the biological cell characterizes a state of the biological cell or a state of an environment of the biological cell, and wherein the task-specific parameters are condition-specific parameters, as is required by claims 1, 9 and 16.   Mudrarkarta also does not disclose that the non-training data set is associated with a simulated experimental condition for the biological cell to simulate the biological cell under the simulated experimental condition over a sequence of time steps, wherein each time step after a first time step comprises: (i) receiving cell state data for a preceding time step that characterizes a state of the biological cell at the preceding time step; (ii) processing the cell state data for the preceding time step using the trained machine-learning model and in accordance with the trained parameters of the trained machine-learning model to generate cell state data for the current time step; and (iii) providing the cell state data for the current time step for processing at a next time step, as is further required by claims 1, 9 and 16.
Gönen nevertheless teaches applying multitask learning to generate a machine-learning model to simulate a biological cell (i.e. a cancer cell) under multiple experimental conditions (e.g. multiple anticancer drugs), wherein each experimental condition for the biological cell characterizes a state of the biological cell or a state of an environment of the biological cell (see e.g. section 2.2 “Genomics of drug sensitivity in cancer” on page i557 and section 3 “METHODS” on page i558).
It would have been obvious to one of ordinary skill in the art, having the teachings of Mudrarkarta and Gönen before the effective filing date of the claimed invention, to modify the method, system and computer-program product taught by Mudrarkarta such that the set of tasks to be performed via execution of the machine-learning model is particularly a set of experimental conditions for a biological cell to be simulated via execution of the machine-learning model, wherein each experimental condition for the biological cell characterizes a state of the biological cell or a state of an environment of the biological cell like taught by Gönen.  The task-specific parameters for each of the particular conditions would thus be considered condition-specific parameters like claimed, and the non-training data set would likewise be associated with a simulated experimental condition for the biological cell to simulate the biological cell under the simulated experimental condition.  It would have been advantageous to one of ordinary skill to utilize such a combination because it can obtain significantly better predictive performance on most drugs, as is taught by Gönen (see e.g. the Abstract).
Lefebvre generally teaches executing a model to simulate the evolution of a tumor by processing a dataset associated with a simulated experimental condition (e.g. a treatment type) for a biological cell (i.e. a tumor cell) to simulate the biological cell under the simulated experimental condition over a sequence of time steps, comprising, at each time step after a first time step: (i) receiving cell state data for a preceding time step that characterizes a state of the biological cell at the preceding time step; (ii) processing the cell state data for the preceding time step using the model and in accordance with parameters of the model to generate cell state data for the current time step; and (iii) providing the cell state data for the current time step for processing at a next time step (see e.g. paragraphs 0006, 0014-0016, 0044-0045, 0049, 0051, 0065, 0072-0076 and 0190-0191).
It would have been obvious to one of ordinary skill in the art, having the teachings of Mudrarkarta, Gönen and Lefebvre before the effective filing date of the claimed invention, to modify the method, system and computer-program product taught by Mudrarkarta and Gönen such that the trained machine-learning model can be utilized to simulate the evolution of the biological cell under the different experimental conditions and over a sequence of time steps like with the model taught by Lefebvre, which would entail: (i) receiving cell state data for a preceding time step that characterizes a state of the biological cell at the preceding time step; (ii) processing the cell state data for the preceding time step using the model (i.e. using the trained machine learning model) and in accordance with parameters of the model to generate cell state data for the current time step; and (iii) providing the cell state data for the current time step for processing at a next time step.  It would have been advantageous to one of ordinary skill to utilize such a combination because it would enable the prediction of the growth of a tumor, and thereby allow a medical expert to adjust treatment priority of patients, as is taught by Lefebvre (see e.g. paragraph 0083). Accordingly, Mudrarkarta, Gönen and Lefebvre are considered to teach, to one of ordinary skill in the art, a method like that of claim 1, a system like that of claim 9 and a computer-program product like that of claim 16.
As per claims 3, 11 and 18, Mudrarkarta suggests that training the machine-learning model is performed using a loss function that relates loss to values of a set of learnable parameters:
Multitask learning We aim to simultaneously, but independently, train multiple neural networks that share most weights. Unlike in transfer learning, where a large fraction of the weights are kept frozen, here we learn all the weights. However, each task carries its own model patch, and trains a patched model. By training all the parameters, this setting offers more adaptability to tasks while not compromising on the total number of parameters. 

To implement multi-task learning, we use the distributed TensorFlow paradigm: a central parameter server receives gradient updates from each of the workers and updates the weights. Each worker reads the input, computes the loss and sends gradients to the parameter server. We allow subsets of workers to train different tasks; workers thus may have different computational graphs, and task-specific input pipelines and loss functions. A visual depiction of this setting is shown in Figure 1.

(Page 3; emphasis added).
Like noted above, Mudrarkarta further teaches that the task models share parameters; Mudrarkarta discloses that the model for each task comprises the task’s “model patch,” i.e. a small set of parameters, along with a shared set of parameters (see e.g. “Multi-task learning” on page 2).  Accordingly, it follows that the quantity of unique learnable parameters (i.e. each task’s “model patch” and the shared set of parameters) is less than a total quantity of parameters (i.e. the total number of parameters for all tasks) represented in the one or more parameter data structures.  The above-described combination of Mudrarkarta, Gönen and Lefebvre is thus further considered to teach a method like that of claim 3, a system like that of claim 11, and a computer-program product like that of claim 18.
As per claims 4, 12 and 19, Mudrarkarta suggests that training the machine-learning model includes: (i) calculating a loss function, wherein the loss function associates a particular loss with the values of the learnable parameters (e.g. weights) of the machine-learning model; (ii) identifying a new set of values for the learnable parameters using the loss function; and (iii) updating the one or more learnable parameters (i.e. in parameter data structures) using the new set of values for the set of learnable parameters:
Multitask learning We aim to simultaneously, but independently, train multiple neural networks that share most weights. Unlike in transfer learning, where a large fraction of the weights are kept frozen, here we learn all the weights. However, each task carries its own model patch, and trains a patched model. By training all the parameters, this setting offers more adaptability to tasks while not compromising on the total number of parameters. 

To implement multi-task learning, we use the distributed TensorFlow paradigm: a central parameter server receives gradient updates from each of the workers and updates the weights. Each worker reads the input, computes the loss and sends gradients to the parameter server. We allow subsets of workers to train different tasks; workers thus may have different computational graphs, and task-specific input pipelines and loss functions. A visual depiction of this setting is shown in Figure 1.

(Page 3; emphasis added).
Like noted above, Mudrarkarta further teaches that the learnable parameters (e.g. weights) of the machine-learning model include parameters that are shared by multiple tasks, i.e. the learnable parameters include a first task-specific parameter that is to have a same value as at least one other learnable task-specific parameter (i.e. for another task).  As further noted above, it would have been obvious to modify the method, system and computer-program product taught by Mudrarkarta such that the set of tasks to be performed via execution of the machine-learning model is particularly a set of experimental conditions like taught by Gönen, the task-specific parameters for each of the particular conditions thus being considered condition-specific parameters like claimed.  Accordingly, it follows that calculating the loss function, identifying a new set of values of the learnable parameters, and updating the one or more parameter data structures using the new set of values like taught by Mudrarkarta and Gönen would particularly include associating the loss with values of parameters of the machine-learning model, including with a value of the particular learnable parameter (i.e. shared parameter) corresponding to the first learnable condition-specific parameter and the at least one other learnable condition-specific parameter, wherein the new set of values would include a new value for the particular learnable parameter, and wherein the updating would include setting each of the at least one value for the at least one other condition-specific parameter and the value for the first condition-specific to the same, new value.  The above-described combination of Mudrarkarta, Gönen and Lefebvre is thus further considered to teach a method like that of claim 4, a system like that of claim 12, and a computer-program product like that of claim 19.
	As per claims 6, 14 and 21, it would have been obvious, as is described above, to modify the method and system taught by Mudrarkarta such that the set of tasks to be performed via execution of the machine-learning model is particularly a set of experimental conditions for a biological cell to be simulated via execution of the machine-learning model taught by Gönen.  Gönen suggests that the machine-learning model can be a model to simulate a biological cell, and wherein at least one of the experimental conditions corresponds to a simulation where a particular gene is missing or inactive (see e.g. section 2.2 “Genomics of drug sensitivity in cancer:” Gönen teaches that the model employs a gene expression as a data source.  Such a gene expression is understandably indicative of whether a particular gene is missing or inactive.).  Accordingly, the above-described combination of Mudrarkarta, Gönen and Lefebvre is further considered to teach a method like that of claim 6, a system like that of claim 14, and a computer-program product like that of claim 21.
As per claims 7 and 15, it would have been obvious, as is described above, to modify the method and system taught by Mudrarkarta such that the set of tasks to be performed via execution of the machine-learning model is particularly a set of experimental conditions for a biological cell to be simulated via execution of the machine-learning model taught by Gönen.  Gönen suggests that the machine-learning model can be a model to simulate a biological cell, and wherein at least one of the set of experimental conditions can correspond to a simulation where a particular reagent (e.g. drug) is added to a medium external to the biological cell (see e.g. section 2.2 “Genomics of drug sensitivity in cancer” on page i557 and section 3 “METHODS” on page i558).  Accordingly, the above-described combination of Mudrarkarta, Gönen and Lefebvre is further considered to teach a method like that of claim 7 and a system like that of claim 15.
As per claim 8, it would have been obvious, as is described above, to modify the method and system taught by Mudrarkarta such that the set of tasks to be performed via execution of the machine-learning model is particularly a set of experimental conditions for a biological cell to be simulated via execution of the machine-learning model taught by Gönen.  Gönen teaches that the set of conditions can entail predicting a susceptibility to particular drugs (see e.g. section 3 “METHODS” on page i558).  Consequently, it would have been apparent, based on the results of execution of such a trained machine-learning model, to determine a drug (i.e. a reagent) to use and to administer and/or further analyze the drug’s effects (i.e. implement a real-world action in a laboratory environment that includes using the reagent).  The above-described combination of Mudrarkarta, Gönen and Lefebvre is thus further considered to teach a method like that of claim 8.

Claims 2, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over the above-described combination of Mudrarkarta, Gönen and Lefebvre, and also over U.S. Patent Application Publication No. 2022/0374776 to Liu et al. (“Liu”).
Regarding claims 2, 10 and 17, Mudrarkarta, Gönen and Lefebvre teach a method like that of claim 1, a system like that of claim 9, and a computer-program product like that of claim 16, as are described above, and which entail configuring one or more parameter data structures with parameter values for sets of condition-specific parameters for sets of experimental conditions, wherein the configuration imposes a constraint that a value for a first condition-specific parameter and at least one value for at least one other condition-specific parameter are the same as each other.  Mudrarkarta particularly teaches that the values for the parameters are updated through multi-task training, in which each of a plurality of tasks is assigned to one or more workers, which send gradient updates to a server:

Multi-task learning We explore a multi-task learning paradigm wherein multiple models that share most of the parameters are trained simultaneously (see Figure 1, right side). Each model has a task-specific model patch. Training is done in a distributed manner; each task is assigned a subset of available workers that send independent gradient updates to both shared and task-specific parameters using standard optimization algorithms. Our results show that simultaneously training two such MobilenetV2 (Sandler et al., 2018) models on ImageNet (Deng et al., 2009) and Places365 (Zhou et al., 2017) reach accuracies comparable to, and sometimes higher than individually trained models.

(Page 2; emphasis added).

    PNG
    media_image3.png
    190
    462
    media_image3.png
    Greyscale

(From Figure 1 on page 2).

Multitask learning We aim to simultaneously, but independently, train multiple neural networks that share most weights. Unlike in transfer learning, where a large fraction of the weights are kept frozen, here we learn all the weights. However, each task carries its own model patch, and trains a patched model. By training all the parameters, this setting offers more adaptability to tasks while not compromising on the total number of parameters. 

To implement multi-task learning, we use the distributed TensorFlow paradigm: a central parameter server receives gradient updates from each of the workers and updates the weights. Each worker reads the input, computes the loss and sends gradients to the parameter server. We allow subsets of workers to train different tasks; workers thus may have different computational graphs, and task-specific input pipelines and loss functions. A visual depiction of this setting is shown in Figure 1.

(Page 3; emphasis added).
Mudrarkarta, Gönen and Lefebvre, however, do not explicitly teach: (i) generating an initial version of a parameter data structure of the one or more parameter data structures to include a value for each of the sets of learnable condition-specific parameters of the set of experimental conditions; (ii) identifying an initial value to initially define the shared or global parameter; and (iii) generating a modified version of the parameter data structure to replace an initial version of the at least one other learnable condition-specific parameter with the initial value, as is required by claims 2, 10 and 17.
	Similar to Mudrarkarta, Liu teaches updating parameters of a machine-learning model through multi-task training, in which a task is assigned to one or more workers (i.e. terminals), which send updates to a server (see e.g. paragraphs 0047-0058).  Liu particularly teaches that the server generates an initial model for the task that comprises initial values for the parameters of the model (see e.g. paragraph 0063).  The training is then done in iterations, each comprising the server distributing the model to the one or more workers, whereby the workers perform local training to update the model parameters and provide the updated model parameters to the server; the server then aggregates the model parameters received from the workers to generate a new model, which may be provided to the terminals for additional training (see e.g. paragraphs 0049-0058 and 0063). 
It would have been obvious to one of ordinary skill in the art, having the teachings of Mudrarkarta, Gönen, Lefebvre and Liu before the effective filing date of the claimed invention, to modify the method, system and computer-program product taught by Mudrarkarta, Gönen and Lefebvre such that the server generates an initial model comprising initial values for the parameters for each task, and whereby the training is performed in iterations like taught by Liu, in which the server distributes the model to the one or more workers, which perform local training to update the model parameters and provide the updated model parameters to the server, which then aggregates the model parameters received from the workers to generate a new model to be provided to the terminals for additional training.  Accordingly, like claimed, the server would: (i) generate an initial version of a parameter data structure of the one or more parameter data structures to include a value for each of the sets of learnable condition-specific parameters of the set of experimental conditions (i.e. the server generates an initial model comprising initial values for the parameters for each task); (ii) identify an initial value to initially define a shared or global parameter (i.e. the server receives updated model parameters from the workers and aggregates the parameters to generate a new model; the values of the aggregated shared parameters of the new model can each be considered an initial value to define a shared or global parameter); and (iii) generate a modified version of the parameter data structure to replace an initial version of the at least one other learnable condition-specific parameter with the initial value (i.e. receiving updated model parameters from the workers and aggregating the parameters to generate a new model would entail generating a modified version of the parameter data structure associated with the model so as to replace the initial version of the model and its parameters with the parameter values of the new model).  It would have been advantageous to one of ordinary skill to utilize such a combination because it would enable the multi-task model to be efficiently trained, as is suggested by Liu (see e.g. paragraphs 0047-0058).  Accordingly, Mudrarkarta, Gönen, Lefebvre and Liu are considered to teach, to one of ordinary skill in the art, a method like that of claim 2, a system like that of claim 10, and a computer-program product like that of claim 17.

Claims 5, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the above-described combination of Mudrarkarta, Gönen and Lefebvre, and also over U.S. Patent Application Publication No. 2018/0157972 to Hu et al. (“Hu”).
Regarding claims 5, 13 and 20, Mudrarkarta, Gönen and Lefebvre teach a method like that of claim 1, a system like that of claim 9, and a computer-program product like that of claim 16, as are described above, and which entail stipulating that a first learnable condition-specific parameter associated with a first experimental condition of a set of experimental conditions is a shared or global parameter that is to have a same value as at least one other learnable condition-specific parameter.  Mudrarkarta particularly teaches that a parameter for a multi-task machine learning model can be a global parameter (i.e. a shared parameter) that is to have a same value across all tasks in a set of tasks, wherein configuring one or more parameter data structures (e.g. to store task models for each of a plurality of workers for training) would understandably entail imposing a constraint that values for parameters corresponding to the global parameter are to be the same across tasks:
Multi-task learning We explore a multi-task learning paradigm wherein multiple models that share most of the parameters are trained simultaneously (see Figure 1, right side). Each model has a task-specific model patch. Training is done in a distributed manner; each task is assigned a subset of available workers that send independent gradient updates to both shared and task-specific parameters using standard optimization algorithms. Our results show that simultaneously training two such MobilenetV2 (Sandler et al., 2018) models on ImageNet (Deng et al., 2009) and Places365 (Zhou et al., 2017) reach accuracies comparable to, and sometimes higher than individually trained models.
(Page 2; emphasis added).

As further described above, it would have been obvious to modify the method, system and computer-program product taught by Mudrarkarta such that the set of tasks to be performed via execution of the machine-learning model is particularly a set of experimental conditions like taught by Gönen, the task-specific parameters for each of the particular experimental conditions thus being considered condition-specific parameters like claimed.  Accordingly, like in claims 5, 13 and 20, Mudrarkarta, Gönen and Lefebvre are considered to teach stipulating that a learnable condition-specific parameter (i.e. a task-specific parameter) is a global parameter that is to have a same value across all experimental conditions (i.e. tasks) in the set of experimental conditions, wherein configuring the one or more parameter structures would impose a constraint that values for parameters corresponding to the global parameter are to be the same across conditions.  Mudrarkarta, Gönen and Lefebvre, however, do not teach that a different learnable condition-specific parameter, i.e. the first learnable condition-specific parameter, is a shared parameter, wherein the combination of the first experimental condition and each corresponding other experimental condition associated with the at least one other learnable condition-specific parameter are an incomplete subset of the set of experimental conditions, as is further required by claims 5, 13 and 20.
	Similar to Mudrarkarta, Hu describes methods and systems for building and using a multitask neural network that may be used to perform multiple inference tasks based on input data (see e.g. paragraph 0004).  Hu particularly teaches that building such a multitask neural network can entail (i) stipulating that a first learnable task-specific parameter (e.g. a parameter within a “branch layer”) is a shared parameter that is to be used with respect to a first task and at least one other task of the multiple tasks performed by the multitask neural network, wherein the combination of the first task and each other task are an incomplete subset of the set of tasks performed by the neural network, and (ii) stipulating that a different learnable task-specific parameter (e.g. a parameter within a “common layer”) is a global parameter which is used by all tasks in the set of tasks performed by the neural network (see e.g. paragraphs 0004, 0020-0021 and 0024, and FIG. 1).
It would have been obvious to one of ordinary skill in the art, having the teachings of Mudrarkarta, Gönen, Lefebvre and Hu before the effective filing date of the claimed invention, to modify the method, system and computer-program product taught by Mudrarkarta, Gönen and Lefebvre so as to also enable a parameter, i.e. the first learnable condition-specific parameter, to be a shared parameter like taught by Hu, wherein the combination of the first task and each other task using the shared parameter (i.e. the combination of the first experimental condition and each corresponding other experimental condition associated with the at least one other learnable condition-specific parameter) are an incomplete subset of the set of tasks (i.e. experimental conditions) performed by the machine-learning model.  It would have been advantageous to one of ordinary skill to utilize such a combination because it would enable parameters to be used by particular subsets of related tasks, without affecting other lesser related tasks, as is evident from Hu (see e.g. paragraphs 0040-0041 and FIG. 2).  Accordingly, Mudrarkarta, Gönen, Lefebvre and Hu are considered to teach, to one of ordinary skill in the art, a method like that of claim 5, a system like that of claim 13, and a computer-program product like that of claim 20.

Response to Arguments
The Examiner acknowledges the Applicant’s amendments to claims 1, 9 and 16.  In response to these amendments, the objections presented in the previous Office Action to claims 1-21 are respectfully withdrawn.
The Applicant’s arguments concerning the 35 U.S.C. § 103 rejections presented in the previous Office Action have been considered, but are moot in view of the new grounds of rejection presented above.

Conclusion
The prior art made of record on form PTO-892 and not relied upon is considered pertinent to applicant’s disclosure.  The applicant is required under 37 C.F.R. §1.111(C) to consider these references fully when responding to this action.  In particular, the U.S. Patent Application Publication to Ejtehadi et al. describes a virtual cell simulator for generating virtual cell models that can simulate a biological cell over a sequence of time steps.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BLAINE T BASOM whose telephone number is (571)272-4044. The examiner can normally be reached Monday-Friday, 9:00 am - 5:30 pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Ell can be reached at (571)270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BTB/
5/2/2026

/MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141
Read full office action
Prosecution Timeline

Show 5 earlier events
Jul 09, 2025
Response Filed
Nov 13, 2025
Final Rejection mailed — §103
Jan 14, 2026
Interview Requested
Jan 27, 2026
Examiner Interview Summary
Jan 27, 2026
Applicant Interview (Telephonic)
Feb 09, 2026
Request for Continued Examination
Feb 22, 2026
Response after Non-Final Action
May 08, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/644,425
Patent 12632794
METHOD AND SYSTEM FOR CROSS-CHAIN CONSENSUS ORIENTED TO FEDERATED LEARNING
4y 5m to grant Granted May 19, 2026
17/806,556
Patent 12608647
MULTIMODAL DATA INFERENCE
3y 10m to grant Granted Apr 21, 2026
17/334,697
Patent 12566981
METHOD AND SYSTEM FOR EVENT PREDICTION BASED ON TIME-DOMAIN BOOTSTRAPPED MODELS
4y 9m to grant Granted Mar 03, 2026
16/817,836
Patent 12487727
Sensory Adjustment Mechanism
5y 8m to grant Granted Dec 02, 2025
17/649,045
Patent 12443420
Automatic Image Conversion
3y 8m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
43%
Grant Probability
66%
With Interview (+22.7%)
4y 6m (~2m remaining)
Median Time to Grant
High
PTA Risk
Based on 326 resolved cases by this examiner. Grant probability derived from career allowance rate.