Last updated: April 19, 2026
Application No. 18/327,707
GENERALIZED EVOLUTIONARY TRAINING FRAMEWORKS FOR DEEP NEURAL NETWORKS

Non-Final OA §101§103
Filed
Jun 01, 2023
Examiner
CHUANG, SU-TING
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Apollo Autonomous Driving USA LLC
OA Round
1 (Non-Final)
This examiner grants 52% of cases after interview

— +39.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 101 resolved cases, 2023–2026
Examiner Intelligence

CHUANG, SU-TING View full profile →
Grants 52% of resolved cases
Career Allow Rate
52 granted / 101 resolved
-3.5% vs TC avg
Strong +40% interview lift
Without
With
+39.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
28 currently pending
Career history
129
Total Applications
across all art units
Statute-Specific Performance

§101
27.4%
-12.6% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
10.8%
-29.2% vs TC avg
§112
11.7%
-28.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 101 resolved cases
Office Action

§101 §103
DETAILED ACTION
Claims 1-20 are pending and have been examined.
--
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
-
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more

Step 1: Claims 1-12 recite a method. Claims 13-19 recite a method. Claim 20 recites a method. Therefore, claims 1-20 are directed to a process.

With respect to claim 1:
2A Prong 1: The claim recites a judicial exception.
generating model snapshot evaluation results by evaluating performance of each model snapshot of the set of model snapshots (mental process – evaluation or judgement, a human can manually generate evaluation results by evaluating performance of model snapshot/state; in light of [0028] ‘The model snapshot for each model may capture the model state at a particular point in time…’)
based upon the model snapshot evaluation results, selecting one or more parent models from the set of model snapshots (mental process – evaluation or judgement, a human can, based on the evaluation results, manually select parent models from the set of snapshots/states)
generating one or more child models, in which a child model is obtained by perturbing at least one or more model components of a parent model from the one or more parent models (mental process – evaluation or judgement, a human can manually generate child models by perturbing components of a parent model)
setting the one or more child models as the set of input models for use in a subsequent training iteration (mental process – evaluation or judgement, a human can manually set the child models as the set of input models)

2A Prong 2: The judicial exception is not integrated into a practical application.
obtaining a set of model snapshots… in which each model snapshot comprises values of model components of its respective input model when the at least one snapshot condition was satisfied (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)
by training a set of input models until at least one snapshot condition is satisfied for each input model from the set of input models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training a set of models)

Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
obtaining a set of model snapshots… in which each model snapshot comprises values of model components of its respective input model when the at least one snapshot condition was satisfied (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))
by training a set of input models until at least one snapshot condition is satisfied for each input model from the set of input models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training a set of models)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 2:
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein for a first iteration, the set of input models comprises a set of base models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; claim 1 recites “by training a set of input models” which is mere instructions to apply an exception. Specifying the details of the set of input models does not cause the limitation to integrate the exception into a practical application.)

Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein for a first iteration, the set of input models comprises a set of base models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; claim 1 recites “by training a set of input models” which is mere instructions to apply an exception. Specifying the details of the set of input models does not cause the limitation to be significantly more than the judicial exception.)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 3:
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein each model snapshot is associated with metadata related to its parent model or models (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting. Claim 1 recites “obtaining a set of model snapshots,” which is insignificant extra-solution activity. Specifying the details of snapshots does not cause the limitation to integrate the exception into a practical application.)

Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein each model snapshot is associated with metadata related to its parent model or models (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i); Claim 1 recites “obtaining a set of model snapshots,” which is insignificant extra-solution activity. Specifying the details of snapshots does not cause the limitation to be significantly more than the judicial exception.)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 4:
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the metadata comprises hyperparameter trajectory information (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting. Claim 1 recites “obtaining a set of model snapshots,” and claim 3 recites “each model snapshot is associated with metadata”, which is insignificant extra-solution activity. Specifying the details of metadata does not cause the limitation to integrate the exception into a practical application.)
 
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein the metadata comprises hyperparameter trajectory information (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i); Claim 1 recites “obtaining a set of model snapshots,” and claim 3 recites “each model snapshot is associated with metadata”, which is insignificant extra-solution activity. Specifying the details of metadata does not cause the limitation to be significantly more than the judicial exception.)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 5:
2A Prong 1: The claim recites a judicial exception.
wherein the one or more parent models are selected based upon one or more selection conditions (mental process – evaluation or judgement, a human can manually select parent models based on selection conditions)

With respect to claim 6:
2A Prong 1: The claim recites a judicial exception.
wherein the one or more selection conditions are determined based upon a hyperparameter trajectory related to the model snapshot and its parent model or models (mental process – evaluation or judgement, a human can manually determine selection conditions based on a hyperparameter trajectory)

With respect to claim 7:
2A Prong 1: The claim recites a judicial exception.
wherein generating the one or more child models is performed based upon one or more perturbation configurations (mental process – evaluation or judgement, a human can manually generate child models based on perturbation configurations)


With respect to claim 8:
2A Prong 1: The claim recites a judicial exception.
wherein the one or more perturbation configurations are determined based upon a hyperparameter trajectory related to the model snapshot and its parent model or models (mental process – evaluation or judgement, a human can manually determine perturbation configurations based on a hyperparameter trajectory)

With respect to claim 9:
2A Prong 1: The claim recites a judicial exception.
wherein selecting one or more parent models from the set of model snapshots utilizes greedy selection, binary tournament selection, roulette wheel selection, rank selection, or steady state selection (mental process – evaluation or judgement, a human can manually select parent models by utilizing one of selection methods)

With respect to claim 10:
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the at least one snapshot condition comprises a user-defined snapshot condition (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; claim 1 recites “by training a set of input models until at least one snapshot condition…” which is mere instructions to apply an exception. Specifying the details of the snapshot condition does not cause the limitation to integrate the exception into a practical application.)

Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein the at least one snapshot condition comprises a user-defined snapshot condition (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; claim 1 recites “by training a set of input models until at least one snapshot condition…” which is mere instructions to apply an exception. Specifying the details of the snapshot condition does not cause the limitation to be significantly more than the judicial exception.)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 11:
2A Prong 1: The claim recites a judicial exception.
further comprising: in response to a convergence condition being satisfied (mental process – evaluation or judgement,--- a human can manually check if a condition is satisfied, e.g. checking the performance; in light of spec [0025] ‘e.g., when model performance’ [0053] ‘based upon a measure of change in performance’)

2A Prong 2: The judicial exception is not integrated into a practical application.
outputting one or more final models with model components selected from the set of model snapshots (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)

Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
outputting one or more final models with model components selected from the set of model snapshots (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 12:
2A Prong 1: The claim recites a judicial exception.
wherein the convergence condition is based upon a measure of change in performance associated with sequentially obtained sets of model snapshots (mental process – evaluation or judgement,--- a human can manually check if a condition (a measure of change) is satisfied)

With respect to claim 13:
2A Prong 1: The claim recites a judicial exception.
selecting one or more parent models from a set of model snapshots, wherein each model snapshot of the set of model snapshots comprises values of model components of a respective input model trained until one or more snapshot conditions were satisfied (mental process – evaluation or judgement, a human can manually select parent models from a set of model snapshots/states)
generating one or more child models by perturbing at least one or more model components from the one or more parent models (mental process – evaluation or judgement, a human can manually generate child models by perturbing components from the parent models)
setting the set of child model snapshots as the set of model snapshots for use in a subsequent training iteration (mental process – evaluation or judgement, a human can manually set the set of child model snapshots/states for training)

2A Prong 2: The judicial exception is not integrated into a practical application.
obtaining a set of child model snapshots… wherein each child model snapshot comprises values of model components of its respective child model when the at least one snapshot condition was satisfied (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)
by training the one or more child models until at least one snapshot condition is satisfied for each of the one or more child models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training a set of models until a condition is satisfied)
 
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
obtaining a set of child model snapshots… wherein each child model snapshot comprises values of model components of its respective child model when the at least one snapshot condition was satisfied (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))
by training the one or more child models until at least one snapshot condition is satisfied for each of the one or more child models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training a set of models until a condition is satisfied)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 14:
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein for a first iteration, the set of model snapshots is obtained (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)
by training one or more base models until the one or more snapshot conditions are satisfied for each of the one or more base models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training models until conditions are satisfied for each model)
 
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein for a first iteration, the set of model snapshots is obtained (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))
by training one or more base models until the one or more snapshot conditions are satisfied for each of the one or more base models (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training models until conditions are satisfied for each model)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 15:
2A Prong 1: The claim recites a judicial exception.
generating model snapshot evaluation results by evaluating performance of each model snapshot of the set of model snapshots (mental process – evaluation or judgement, a human can manually generate evaluation results by evaluating performance of each model snapshot)
wherein selecting the one or more parent models from the set of model snapshots comprises: … based upon the model snapshot evaluation results, selecting the one or more parent models from the set of model snapshots (mental process – evaluation or judgement, a human can manually select parent models from the snapshots based on the evaluation results)

With respect to claim 16:
2A Prong 1: The claim recites a judicial exception.
wherein selecting one or more parent models from the set of model snapshots utilizes greedy selection, binary tournament selection, roulette wheel selection, rank selection, or steady state selection (mental process – evaluation or judgement, a human can manually select parent models by utilizing one of selection methods)

With respect to claim 17:
2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the at least one snapshot condition comprises a user-defined snapshot condition (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; claim 13 recites “by training the one or more child models until at least one snapshot condition is satisfied…” which is mere instructions to apply an exception. Specifying the details of the snapshot condition does not cause the limitation to integrate the exception into a practical application.)
 
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
wherein the at least one snapshot condition comprises a user-defined snapshot condition (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; claim 13 recites “by training the one or more child models until at least one snapshot condition is satisfied…” which is mere instructions to apply an exception. Specifying the details of the snapshot condition does not cause the limitation to be significantly more than the judicial exception.)

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 18:
2A Prong 1: The claim recites a judicial exception.
further comprising: in response to a convergence condition being satisfied (mental process – evaluation or judgement,--- a human can manually check if a condition is satisfied, e.g. checking the performance; in light of spec [0025] ‘e.g., when model performance’ [0053] ‘based upon a measure of change in performance’)

2A Prong 2: The judicial exception is not integrated into a practical application.
outputting one or more final models with model components selected from the set of model snapshots (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)

Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
outputting one or more final models with model components selected from the set of model snapshots (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

With respect to claim 19:
2A Prong 1: The claim recites a judicial exception.
wherein the convergence condition is based upon a measure of change in performance associated with sequentially obtained sets of child model snapshots (mental process – evaluation or judgement,--- a human can manually check if a condition (a measure of change) is satisfied)

With respect to claim 20:
2A Prong 1: The claim recites a judicial exception.
defining a set of model snapshots using the set of trained models, wherein each model snapshot of the set of model snapshots comprises values of model components of its respective trained model (mental process – evaluation or judgement, a human can define a set of model snapshots/states; in light of [0028] ‘The model snapshot for each model may capture the model state at a particular point in time…’)
generating model evaluation results by evaluating performance of each model snapshot of the set of model snapshots (mental process – evaluation or judgement, a human can manually generate evaluation results by evaluating performance of each model snapshot)
based upon the model evaluation results, selecting one or more parent models from the set of model snapshots (mental process – evaluation or judgement, a human can manually select parent models from the snapshots based on the evaluation results)
generating one or more child models by perturbing at least one or more model components of the one or more parent models (mental process – evaluation or judgement, a human can manually generate child models by perturbing components of a parent model)
defining the one or more child models as the set of input models (mental process – evaluation or judgement, a human can manually define the child models as the set of input models)
until a convergence condition is satisfied… in response to the convergence condition being satisfied (mental process – evaluation or judgement,--- a human can manually check if a condition is satisfied, e.g. checking the performance; in light of spec [0025] ‘e.g., when model performance’ [0053] ‘based upon a measure of change in performance’)

2A Prong 2: The judicial exception is not integrated into a practical application.
obtaining a set of trained models (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)
by training a set of input models until at least one snapshot condition is satisfied (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training a set of models until a condition is satisfied)
outputting one or more final models with model components selected from the set of model snapshots (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting)

Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.

2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
obtaining a set of trained models (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))
by training a set of input models until at least one snapshot condition is satisfied (mere instructions to apply an exception – MPEP 2106.05(f), (3) The particularity or generality of the application of the judicial exception; high level recitation of training a set of models until a condition is satisfied)
outputting one or more final models with model components selected from the set of model snapshots (insignificant extra-solution activity – MPEP 2106.05(g), (3) data gathering and outputting, and WURC: receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 - MPEP 2106.05(d)(II)(i))

Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-20 rejected under 35 U.S.C. 103 as being unpatentable over Liang ("Regularized Evolutionary Population-Based Training" 20210721) in view of Blanchard ("Language models for the prediction of SARS-CoV-2 inhibitors" 20221007) in further view of Dasgupta (US 20230195845 A1, filed on 20230116)

In regard to claim 1, Liang teaches: A computer-implemented method for neural network training, comprising:  (Liang, p. 1, Abstract "This paper presents an algorithm called Evolutionary Population-Based Training (EPBT) that interleaves the training of a DNN’s weights with the metalearning of loss functions."; p. 6, 5.1 Performance "... multiple models are simultaneously trained with EPBT...")

… based upon the model snapshot evaluation results, selecting one or more parent models from the set of model snapshots; (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [the set of model snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} [model snapshot] where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperhapameters, and fgi is a real-values scalar fitness... fgi is used to select promising individuals M^gi to form a parent set M^g [selecting parent models]... The validation performance of D^gi is used to determine a new fitness value f^gi... [based on evaluation results]"; also see p. 4, Algorithm 1 EPBT; in light of spec [0028] 'The model snapshot... may include model components/information such as model parameters, hyperparameters, etc.')
generating one or more child models, in which a child model is obtained by perturbing at least one or more model components of a parent model from the one or more parent models; and (Liang, p. 3, 3.1 Overview "M^g is used to create a set Ng, which contains new individuals Ngi. [child models] Each of these new individual inherits Dgi unchanged from its parent M^gi, but with modified hyperparameters h^gi [by perturbing model components (hyperparameters) of a parent model]... each Ngi is evaluated by training Dgi on a task or data set, thereby creating a model with updated weights D^gi [by perturbing model components (weights) of a parent model]")
setting the one or more child models as the set of input models for use in a subsequent training iteration. (Liang, p. 3, 3.1 Overview "Thus, by the end of generation g, the population pool contains the evaluated individuals N^gi... This process is repeated for multiple generations [setting child models as input models for a next iteration]...")

Liang does not teach, but Blanchard teaches: obtaining a set of model snapshots by training a set of input models until at least one snapshot condition is satisfied for each input model from the set of input models, (Blanchard, p. 589, 4. Current state of the art "we here utilize a strategy from natural language processing, where a model is initially trained in an unsupervised manner before being fine-tuned to make specific predictions."; p. 591, 5.1. Pre-training molecule language models "Our strategy for dataset augmentation is motivated by the pre-training stage for masked language models [(pre-)training a set of input models]"; p. 592, 5.1.3. Pre-training with large batch sizes "Each pre-training run consisted of seven epochs, with model checkpoints saved [a set of model snapshots] and validation accuracy determined after each epoch [e.g. until at least one snapshot condition is satisfied]"; see Fig. 
    PNG
    media_image1.png
    300
    1034
    media_image1.png
    Greyscale
1, Pre-training Molecule Language Models)

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Liang to incorporate the teachings of Blanchard by including pre-trained models. Doin so would allow subsequent tasks to use data from previous results and therefore reduce the time for subsequent tasks. (Blanchard, p. 590, 4.4. Genetic algorithms "During pre-training the language model learns to predict missing sequences based on context. The predictions provide a ranked list of all possible substitutions for a given sub-sequence."; p. 598, 8. Implications "By combining a pre-trained model for molecule and protein sequences, the fine-tuning task can leverage data from many previous experimental investigations... Developing a generalizable model is key to reducing the time for discovering and screening new targets...")


    PNG
    media_image2.png
    214
    648
    media_image2.png
    Greyscale
Liang and Blanchard do not teach, but Dasgupta teaches: in which each model snapshot comprises values of model components of its respective input model when the at least one snapshot condition was satisfied; (Dasgupta, [0183] "At operation 1220, a checkpoint [each model snapshot] of the ML media model is generated. In some embodiments, the checkpoint may be a snapshot  in time of the model's various parameters that are being updated during the training. [e.g. when the snapshot condition was satisfied] Thus, each checkpoint may represent a state of the model [values of model components, i.e. model's parameters and hyperparameter] during the training process. In some embodiments, these checkpoints may be generated periodically, for example, once every fixed number of training steps. [e.g. when the snapshot condition was satisfied] In some embodiments, checkpoints may be generated based on performance goals or milestones reached by the model during training, or based on other conditions determined during training . [e.g. when the snapshot condition was satisfied]"; [0101] "these checkpoints may be saved based on the model reaching certain performance threshold or at certain designated points during the training process. For example, a checkpoint may be taken every time that model hyperparameters are tuned during training. [e.g. when the snapshot condition was satisfied]"; the state of a model is a snapshot of its parameters and hyperparameters; snapshot condition can be every step or every fixed number of steps when parameters or hyperparameters are updated;  in light of spec [0028])

generating model snapshot evaluation results by evaluating performance of each model snapshot of the set of model snapshots; (Dasgupta, [0102] "For each checkpoint 320, the checkpoint is evaluated using a checkpoint evaluator 340, against a validation data set 330… evaluation results of the checkpoint evaluator 340 may be saved to an evaluation results repository 350.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Liang and Blanchard to incorporate the teachings of Dasgupta by including user-defined checkpoints and having checkpoints and performance results data saved. Doing so would allow those data be examined or reused later, or used to restart a portion of the training process. (Dasgupta, [0105] "some or all of the generated checkpoints and performance results data may be saved, so that they can be examined or reused later, or used to restart a portion of the training process.")

In regard to claim 2, Liang does not teach, but Blanchard teaches: wherein for a first iteration, the set of input models comprises a set of base models. (Blanchard, p. 589, 4. Current state of the art "we here utilize a strategy from natural language processing, where a model is initially trained [for a first iteration, initially] in an unsupervised manner before being fine-tuned to make specific predictions."; p. 591, 5.1. Pre-training molecule language models "Our strategy for dataset augmentation is motivated by the pre-training stage for masked language models [a set of base models]"; see Fig. 1, Pre-training Molecule Language Models [base models])
The rationale for combining the teachings of Liang and Blanchard is the same as set forth in the rejection of claim 1.

In regard to claim 3, Liang teaches: wherein each model snapshot is associated with metadata related to its parent model or models. (Liang, p. 3, 3.1 Overview "Each Mgi = {Dgi, hgi, fgi} [model snapshot] where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... M^g is used to create a set Ng, which contains new individuals Ngi. Each of these new individual inherits Dgi unchanged from its parent M^gi, but with modified hyperparameters h^gi [metadata (weights and hyperparameters) related to its parent model or models]")

In regard to claim 4, Liang teaches: wherein the metadata comprises hyperparameter trajectory information. (Liang, p. 3, 3.1 Overview "Each of these new individual inherits Dgi unchanged from its parent M^gi, but with modified hyperparameters h^gi"; within loops in Algorithm 1 EPBT on p. 4; hgi is created from its parent h(g-1)i, h(g-2)i... [hyperparameter trajectory information])

In regard to claim 5, Liang teaches: wherein the one or more parent models are selected based upon one or more selection conditions. (Liang, p. 4, 3.2 Genetic Operators "Step 1 – Tournament Selection: Using the tournament selection operator tau, t individuals are repeatedly chosen at random from Mg.")

In regard to claim 6, Liang teaches: wherein the one or more selection conditions are determined based upon a hyperparameter trajectory related to the model snapshot and its parent model or models. (Liang, p. 4, 3.2 Genetic Operators "Step 1 – Tournament Selection: Using the tournament selection operator tau, t individuals are repeatedly chosen at random from Mg. Each time, the individuals are compared and the one with the highest fitness is added to M^g."; Mg is related to Mgi = {Dgi, hgi, fgi} [model snapshot], which includes hgi; within loops in Algorithm 1 EPBT on p. 4, hgi is created from its parent h(g-1)i, h(g-2)i... [a hyperparameter trajectory] with added noise)

In regard to claim 7, Liang teaches: wherein generating the one or more child models is performed based upon one or more perturbation configurations. (Liang, p. 3, 3.1 Overview "M^g is used to create a set Ng, which contains new individuals Ngi. [child models] Each of these new individual inherits Dgi unchanged from its parent M^gi, but with modified hyperparameters h^gi [by perturbing model components (hyperparameters) of a parent model]... each Ngi is evaluated by training Dgi on a task or data set, thereby creating a model with updated weights D^gi [by perturbing model components (weights) of a parent model]")

In regard to claim 8, Liang teaches: wherein the one or more perturbation configurations are determined based upon a hyperparameter trajectory related to the model snapshot and its parent model or models. (Liang, p. 3, 3.1 Overview "Each Mgi = {Dgi, hgi, fgi} [model snapshot] where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... Each of these new individual inherits Dgi unchanged from its parent M^gi, but with modified hyperparameters h^gi [perturbation based on a hyperparameter trajectory]"; within loops in Algorithm 1 EPBT on p. 4; hgi is created from its parent h(g-1)i, h(g-2)i... [hyperparameter trajectory] with added noise)

In regard to claim 9, Liang teaches: wherein selecting one or more parent models from the set of model snapshots utilizes greedy selection, binary tournament selection, roulette wheel selection, rank selection, or steady state selection. (Liang, p. 4, 3.2 Genetic Operators "Step 1 – Tournament Selection: Using the tournament selection operator tau, t individuals are repeatedly chosen at random from Mg. Each time, the individuals are compared and the one with the highest fitness is added to M^g... The value t=2 [binary tournament selection] is commonly used in EA literature and in the experiments in this paper also.")

In regard to claim 10, Liang and Blanchard do not teach, but Dasgupta teaches: wherein the at least one snapshot condition comprises a user-defined snapshot condition. (Dasgupta, [0105] " In some embodiments, these checkpoints may be generated periodically, for example, once every fixed number of training steps. [e.g. the snapshot condition] In some embodiments, checkpoints may be generated based on performance goals or milestones reached [e.g. the snapshot condition] by the model during training, or based on other conditions determined during training ."; [0136] "As shown, the user interface 700 [a user-defined snapshot condition] also includes in this example a training configuration section 730... a setting for how often model checkpoints should be generated."; [0137]; snapshot condition can be every fixed number of steps when parameters or hyperparameters are updated)
The rationale for combining the teachings of Liang, Blanchard and Dasgupta is the same as set forth in the rejection of claim 1.

In regard to claim 11, Liang teaches: further comprising: in response to a convergence condition being satisfied, outputting one or more final models with model components selected from the set of model snapshots. (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [selected from the set of model states/snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... [models with model components {Dgi, hgi, fgi}]... This process is repeated for multiple generations until the fitness of the best individual in the population converges. [after the fitness has converges (convergence condition being satisfied), retrieving the elite individual, i.e. outputting final models]")

In regard to claim 12, Liang teaches: wherein the convergence condition is based upon a measure of change in performance associated with sequentially obtained sets of model snapshots. (Liang, p.3, Figure 1 "In Step 3, these individuals are evaluated on a task and have their model weights and fitness (i.e., performance in the task) updated."; p. 3, 3.1 Overview "This process is repeated for multiple generations until the fitness of the best individual in the population converges. [convergence on the fitness, i.e. convergence condition, a measure of change in performance]"; convergence occurs when a model no longer improves on training, i.e. [a measure of change in performance];  new fitness (performance) f^gi is associated with sets N^gi, M^gi or Mg+1 [e.g. sequentially obtained sets of model states/snapshots], which is generated within loops in Algorithm 1 EPBT on p. 4)

In regard to claim 13, Liang teaches: A computer-implemented method for neural network training, comprising: (Liang, p. 1, Abstract "This paper presents an algorithm called Evolutionary Population-Based Training (EPBT) that interleaves the training of a DNN’s weights with the metalearning of loss functions."; p. 6, 5.1 Performance "... multiple models are simultaneously trained with EPBT...")
selecting one or more parent models from a set of model snapshots, (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [the set of model snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} [model snapshot] where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperhapameters, and fgi is a real-values scalar fitness... fgi is used to select promising individuals M^gi to form a parent set M^g [selecting parent models]... The validation performance of D^gi is used to determine a new fitness value f^gi... [based on evaluation results]"; also see p. 4, Algorithm 1 EPBT; in light of spec [0028] 'The model snapshot... may include model components/information such as model parameters, hyperparameters, etc.')
… generating one or more child models by perturbing at least one or more model components from the one or more parent models; (Liang, p. 3, 3.1 Overview "M^g is used to create a set Ng, which contains new individuals Ngi. [child models] Each of these new individual inherits Dgi unchanged from its parent M^gi, but with modified hyperparameters h^gi [by perturbing model components (hyperparameters) of a parent model]... each Ngi is evaluated by training Dgi on a task or data set, thereby creating a model with updated weights D^gi [by perturbing model components (weights) of a parent model]")
obtaining a set of child model snapshots by training the one or more child models (Liang, p. 3, 3.1 Overview "each Ngi is evaluated by training Dgi on a task or data set, [training the child models] thereby creating a model with updated weights D^gi… Thus, by the end of generation g, the population pool contains the evaluated individuals N^gi... where N^gi = {D^gi, h^gi, f^gi}. [obtaining  a set of child model snapshots]")
… setting the set of child model snapshots as the set of model snapshots for use in a subsequent training iteration. (Liang, p. 3, 3.1 Overview "Thus, by the end of generation g, the population pool contains the evaluated individuals N^gi... This process is repeated for multiple generations [setting child models as input models for a next iteration]...")

Liang does not teach, but Blanchard teaches: input model trained until one or more snapshot conditions were satisfied; (Blanchard, p. 589, 4. Current state of the art "we here utilize a strategy from natural language processing, where a model is initially trained in an unsupervised manner before being fine-tuned to make specific predictions."; p. 591, 5.1. Pre-training molecule language models "Our strategy for dataset augmentation is motivated by the pre-training stage for masked language models [(pre-)training a set of input models]"; p. 592, 5.1.3. Pre-training with large batch sizes "Each pre-training run consisted of seven epochs, with model checkpoints saved [a set of model snapshots] and validation accuracy determined after each epoch [e.g. until at least one snapshot condition is satisfied]"; see Fig. 1, Pre-training Molecule Language Models)

The rationale for combining the teachings of Liang and Blanchard is the same as set forth in the rejection of claim 1.

Liang and Blanchard do not teach, but Dasgupta teaches: wherein each model snapshot of the set of model snapshots comprises values of model components of a respective input model… (Dasgupta, [0183] "At operation 1220, a checkpoint [each model snapshot] of the ML media model is generated. In some embodiments, the checkpoint may be a snapshot  in time of the model's various parameters that are being updated during the training. Thus, each checkpoint may represent a state of the model [values of model components, i.e. model's parameters and hyperparameter] during the training process."; [0101] "these checkpoints may be saved based on the model reaching certain performance threshold or at certain designated points during the training process. For example, a checkpoint may be taken every time that model hyperparameters are tuned during training."; the state of a model is a snapshot of its parameters and hyperparameters)
… until at least one snapshot condition is satisfied for each of the one or more child models, wherein each child model snapshot comprises values of model components of its respective child model when the at least one snapshot condition was satisfied; and (Dasgupta, [0183] "At operation 1220, a checkpoint [each model snapshot] of the ML media model is generated. In some embodiments, the checkpoint may be a snapshot  in time of the model's various parameters that are being updated during the training. [e.g. until/when the snapshot condition was satisfied] Thus, each checkpoint may represent a state of the model [values of model components, i.e. model's parameters and hyperparameter] during the training process. In some embodiments, these checkpoints may be generated periodically, for example, once every fixed number of training steps. [e.g. until/when the snapshot condition was satisfied] In some embodiments, checkpoints may be generated based on performance goals or milestones reached by the model during training, or based on other conditions determined during training . [e.g. until/when the snapshot condition was satisfied]"; [0101] "these checkpoints may be saved based on the model reaching certain performance threshold or at certain designated points during the training process. For example, a checkpoint may be taken every time that model hyperparameters are tuned during training. [e.g. until/when the snapshot condition was satisfied]"; the state of a model is a snapshot of its parameters and hyperparameters; snapshot condition can be every step or every fixed number of steps when parameters or hyperparameters are updated;  in light of spec [0028])
The rationale for combining the teachings of Liang, Blanchard and Dasgupta is the same as set forth in the rejection of claim 1.

In regard to claim 14, Liang does not teach, but Blanchard teaches: wherein for a first iteration, the set of model snapshots is obtained by training one or more base models until the one or more snapshot conditions are satisfied for each of the one or more base models. (Blanchard, p. 589, 4. Current state of the art "we here utilize a strategy from natural language processing, where a model is initially trained [for a first iteration, initially] in an unsupervised manner before being fine-tuned to make specific predictions."; p. 591, 5.1. Pre-training molecule language models "Our strategy for dataset augmentation is motivated by the pre-training stage for masked language models [(pre-)training a set of input models]"; p. 592, 5.1.3. Pre-training with large batch sizes "Each pre-training run consisted of seven epochs, with model checkpoints saved [a set of model snapshots] and validation accuracy determined after each epoch [e.g. until at least one snapshot condition is satisfied]"; see Fig. 1, Pre-training Molecule Language Models)
The rationale for combining the teachings of Liang and Blanchard is the same as set forth in the rejection of claim 1.

In regard to claim 15, Liang teaches: wherein selecting the one or more parent models from the set of model snapshots comprises: (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [the set of model snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} [model snapshot] where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... fgi is used to select promising individuals M^gi to form a parent set M^g [selecting parent models]")
… based upon the model snapshot evaluation results, selecting the one or more parent models from the set of model snapshots. (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [the set of model snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} [model snapshot] where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... fgi is used to select promising individuals M^gi to form a parent set M^g [selecting parent models]... The validation performance of D^gi is used to determine a new fitness value f^gi... [based on evaluation results]"; also see p. 4, Algorithm 1 EPBT; in light of spec [0028] 'The model snapshot... may include model components/information such as model parameters, hyperparameters, etc.')
Liang and Blanchard do not teach, but Dasgupta teaches: generating model snapshot evaluation results by evaluating performance of each model snapshot of the set of model snapshots; and (Dasgupta, [0102] "For each checkpoint 320, the checkpoint is evaluated using a checkpoint evaluator 340, against a validation data set 330… evaluation results of the checkpoint evaluator 340 may be saved to an evaluation results repository 350.")
The rationale for combining the teachings of Liang, Blanchard and Dasgupta is the same as set forth in the rejection of claim 1.

In regard to claim 16, Liang teaches: wherein selecting one or more parent models from the set of model snapshots utilizes greedy selection, binary tournament selection, roulette wheel selection, rank selection, or steady state selection. (Liang, p. 4, 3.2 Genetic Operators "Step 1 – Tournament Selection: Using the tournament selection operator tau, t individuals are repeatedly chosen at random from Mg. Each time, the individuals are compared and the one with the highest fitness is added to M^g... The value t=2 [binary tournament selection] is commonly used in EA literature and in the experiments in this paper also.")

In regard to claim 17, Liang teaches: wherein the at least one snapshot condition comprises a user-defined snapshot condition. (Dasgupta, [0105] " In some embodiments, these checkpoints may be generated periodically, for example, once every fixed number of training steps. [e.g. the snapshot condition] In some embodiments, checkpoints may be generated based on performance goals or milestones reached [e.g. the snapshot condition] by the model during training, or based on other conditions determined during training ."; [0136] "As shown, the user interface 700 [a user-defined snapshot condition] also includes in this example a training configuration section 730... a setting for how often model checkpoints should be generated."; [0137]; snapshot condition can be every fixed number of steps when parameters or hyperparameters are updated)

In regard to claim 18, Liang teaches: in response to a convergence condition being satisfied, outputting one or more final models with model components selected from the set of model snapshots. (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [selected from the set of model states/snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... [models with model components {Dgi, hgi, fgi}]... This process is repeated for multiple generations until the fitness of the best individual in the population converges. [after the fitness has converges (convergence condition being satisfied), retrieving the elite individual, i.e. outputting final models]")

In regard to claim 19, Liang teaches: wherein the convergence condition is based upon a measure of change in performance associated with sequentially obtained sets of child model snapshots. (Liang, p.3, Figure 1 "In Step 3, these individuals are evaluated on a task and have their model weights and fitness (i.e., performance in the task) updated."; p. 3, 3.1 Overview "This process is repeated for multiple generations until the fitness of the best individual in the population converges. [convergence on the fitness, i.e. convergence condition, a measure of change in performance]"; convergence occurs when a model no longer improves on training, i.e. [a measure of change in performance];  new fitness (performance) f^gi is associated with sets N^gi, M^gi or Mg+1 [sequentially obtained sets of model states/snapshots], which is generated within loops in Algorithm 1 EPBT on p. 4)

In regard to claim 20, Liang teaches: A computer-implemented method for neural network training, comprising: (Liang, p. 1, Abstract "This paper presents an algorithm called Evolutionary Population-Based Training (EPBT) that interleaves the training of a DNN’s weights with the metalearning of loss functions."; p. 6, 5.1 Performance "... multiple models are simultaneously trained with EPBT...")
until a convergence condition is satisfied: (Liang, p. 3, 3.1 Overview "This process is repeated for multiple generations until the fitness of the best individual in the population converges. [until the fitness has converges (until convergence condition is satisfied)")
… based upon the model evaluation results, selecting one or more parent models from the set of model snapshots; (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [the set of model snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} [model snapshot] where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... fgi is used to select promising individuals M^gi to form a parent set M^g [selecting parent models]... The validation performance of D^gi is used to determine a new fitness value f^gi... [based on evaluation results]"; also see p. 4, Algorithm 1 EPBT; in light of spec [0028] 'The model snapshot... may include model components/information such as model parameters, hyperparameters, etc.')
generating one or more child models by perturbing at least one or more model components of the one or more parent models; and (Liang, p. 3, 3.1 Overview "M^g is used to create a set Ng, which contains new individuals Ngi. [child models] Each of these new individual inherits Dgi unchanged from its parent M^gi, but with modified hyperparameters h^gi [by perturbing model components (hyperparameters) of a parent model]... each Ngi is evaluated by training Dgi on a task or data set, thereby creating a model with updated weights D^gi [by perturbing model components (weights) of a parent model]")
defining the one or more child models as the set of input models; and (Liang, p. 3, 3.1 Overview "Thus, by the end of generation g, the population pool contains the evaluated individuals N^gi... This process is repeated for multiple generations [setting child models as input models]...")
in response to the convergence condition being satisfied, outputting one or more final models with model components selected from the set of model snapshots. (Liang, p. 3, 3.1 Overview "At the beginning of generation g, the population Mg [selected from the set of model states/snapshots] consists of individuals Mgi. Each Mgi = {Dgi, hgi, fgi} where Dgi is a DNN model (defined as the weights...), hgi is a set of hyperparameters, and fgi is a real-values scalar fitness... [models with model components {Dgi, hgi, fgi}]... This process is repeated for multiple generations until the fitness of the best individual in the population converges. [after the fitness has converges (convergence condition being satisfied), retrieving the elite individual, i.e. outputting final models]")

Liang does not teach, but Blanchard teaches: obtaining a set of trained models by training a set of input models until at least one snapshot condition is satisfied; defining a set of model snapshots using the set of trained models, (Blanchard, p. 589, 4. Current state of the art "we here utilize a strategy from natural language processing, where a model is initially trained in an unsupervised manner before being fine-tuned to make specific predictions."; p. 591, 5.1. Pre-training molecule language models "Our strategy for dataset augmentation is motivated by the pre-training stage for masked language models [obtaining a set of trained models by training a set of input models]"; p. 592, 5.1.3. Pre-training with large batch sizes "Each pre-training run consisted of seven epochs, with model checkpoints saved [defining a set of model snapshots] and validation accuracy determined after each epoch [e.g. until at least one snapshot condition is satisfied]"; see Fig. 1, Pre-training Molecule Language Models, the saved snapshots are the defined snapshots using the (pre-)trained models)
The rationale for combining the teachings of Liang and Blanchard is the same as set forth in the rejection of claim 1.

Liang and Blanchard do not teach, but Dasgupta teaches: wherein each model snapshot of the set of model snapshots comprises values of model components of its respective trained model; (Dasgupta, [0183] "At operation 1220, a checkpoint [each model snapshot] of the ML media model is generated. In some embodiments, the checkpoint may be a snapshot  in time of the model's various parameters that are being updated during the training. Thus, each checkpoint may represent a state of the model [values of model components, i.e. model's parameters and hyperparameter] during the training process."; [0101] "these checkpoints may be saved based on the model reaching certain performance threshold or at certain designated points during the training process. For example, a checkpoint may be taken every time that model hyperparameters are tuned during training."; the state of a model is a snapshot of its parameters and hyperparameters)
generating model evaluation results by evaluating performance of each model snapshot of the set of model snapshots; (Dasgupta, [0102] "For each checkpoint 320, the checkpoint is evaluated using a checkpoint evaluator 340, against a validation data set 330… evaluation results of the checkpoint evaluator 340 may be saved to an evaluation results repository 350.")

The rationale for combining the teachings of Liang, Blanchard and Dasgupta is the same as set forth in the rejection of claim 1.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Both Eisenman Rothe teach more information regarding checkpoints.
Eisenman ("Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models" 20220404) teaches (Eisenman, p. 929, Abstract "Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that they can be used to recover from failures to ensure rapid train ing progress."; p. 934, 4.2 Decoupled Checkpointing "Checkpointing requires the model parameters to be atomically copied for further processing and storage.")
Rothe ("Leveraging Pre-trained Checkpoints for Sequence Generation Task" 20200416) teaches
(Rothe, p. 1, Abstract "By warm-starting from the publicly released checkpoints…  We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model...")

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519. The examiner can normally be reached Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SU-TING CHUANG/Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Jun 01, 2023
Application Filed
Mar 21, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/953,977
Patent 12561600
LINEAR TIME ALGORITHMS FOR PRIVACY PRESERVING CONVEX OPTIMIZATION
2y 5m to grant Granted Feb 24, 2026
16/984,909
Patent 12518154
TRAINING MULTIMODAL REPRESENTATION LEARNING MODEL ON UNNANOTATED MULTIMODAL DATA
2y 5m to grant Granted Jan 06, 2026
17/224,858
Patent 12481725
SYSTEMS AND METHODS FOR DOMAIN-SPECIFIC ENHANCEMENT OF REAL-TIME MODELS THROUGH EDGE-BASED LEARNING
2y 5m to grant Granted Nov 25, 2025
16/540,414
Patent 12468951
Unsupervised outlier detection in time-series data
2y 5m to grant Granted Nov 11, 2025
18/609,221
Patent 12412095
COOPERATIVE LEARNING NEURAL NETWORKS AND SYSTEMS
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
91%
With Interview (+39.7%)
4y 5m
Median Time to Grant
Low
PTA Risk
Based on 101 resolved cases by this examiner. Grant probability derived from career allow rate.