Prosecution Insights
Last updated: April 19, 2026
Application No. 17/471,118

RANDOMIZED PARAMETER SETTING FOR MODEL TRAINING

Non-Final OA §101§103§112
Filed
Sep 09, 2021
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Actapio Inc.
OA Round
3 (Non-Final)
52%
Grant Probability
Moderate
3-4
OA Rounds
4y 7m
To Grant
90%
With Interview

Examiner Intelligence

Grants 52% of resolved cases
52%
Career Allow Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
68 currently pending
Career history
204
Total Applications
across all art units

Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.9%
-18.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases

Office Action

§101 §103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/3/2025 has been entered. Remarks This Office Action is responsive to Applicants' Amendment filed on December 3, 2025, in which claims 1 and 16-18 are currently amended. Claims 1-18 are currently pending. Response to Arguments Applicant’s arguments with respect to rejection of claims 1-17 under 35 U.S.C. 101 based on amendment have been considered and are persuasive. Examiner notes that there is nothing in claim 18 that would make it unreasonable to interpret claim 18 as being directed entirely towards software (See instant specification [¶0426] “individual components of each of the illustrated devices are given as a functional concept, and do not necessarily have to be physically configured as illustrated in the figures. That is, the specific form of distribution/integration of each of devices is not limited to the one illustrated in the figure. All or part of the device is functionally or physically distributed/integrated in arbitrary units depending on various loads and usage conditions.”), such that claim 18 appears to be directed wholly towards a judicial exception. Applicant’s arguments with respect to rejection of claims 1-18 under 35 U.S.C. 103 based on amendment have been considered and are persuasive. The argument is moot in view of a new ground of rejection set forth below. Claim Objections Claim 8 objected to because of the following informalities: “wherein the select one” should read “wherein the selecting of one”. Claim 18 objected to because of the following informalities: "A learning apparatus comprising: generate" should read to the effect of "A learning apparatus to: generate". Appropriate correction is required. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Regarding claims 1 and 16-18, "reduces processing time for the prediction processes compared to unoptimized allocation" is indefinite. "Unoptimized allocation" is a relative term with no basis for comparison. Since optimized/unoptimized is a broad relative term, a direct comparison to determine a reduction in processing time is seen as indefinite. Two different "unoptimized" baselines could lead to opposite contradictory outcomes. Since the scope of the claim cannot be reasonably determined the claim is seen as being indefinite. In the interest of further examination the claim is interpreted as "reduces processing time for the prediction processes compared to the processing time without using the trained selected model". Regarding claims 1 and 16-18, "provides the trained selected model to control allocation of physical computing resources" is grammatically indefinite. It's syntactically unclear what is doing the "controlling" (the learning apparatus, the trained selected model, something else altogether, etc.). Since the subject is unclear the scope of the claim cannot be reasonably determined. In the interest of further examination the claim limitation is interpreted as "the learning apparatus provides the trained selected model, the providing enabling the learning apparatus to control allocation of physical computing resources". The remaining claims are rejected with respect to their dependence on the rejected claims. Claim Rejections - 35 USC § 101 101 Rejection 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claim 18 is rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter. Regarding claim 18, claims 18 is directed towards an apparatus. While an apparatus is one of the statutory categories, Examiner notes that under broadest reasonable interpretation there is nothing limiting the claimed apparatus from also being interpreted as software, which is a judicial exception. The instant specification explicitly supports this interpretation ([¶0426] “individual components of each of the illustrated devices are given as a functional concept, and do not necessarily have to be physically configured as illustrated in the figures. That is, the specific form of distribution/integration of each of devices is not limited to the one illustrated in the figure. All or part of the device is functionally or physically distributed/integrated in arbitrary units depending on various loads and usage conditions.”). There is no indication that the apparatus cannot be software. Therefore, claim 18 is rejected as software-per-se. Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claim 18 is rejected under 35 U.S.C. § 101. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1-5, 7-11, and 15-18 are rejected under U.S.C. §103 as being unpatentable over the combination of Koch (US20180240041A1) and Varadarajan (US20190095819A1). Regarding claim 1, Koch teaches A learning apparatus comprising: a processor, ([¶0033] "User device 200 may include an input interface 202, an output interface 204, a communication interface 206, a computer-readable medium 208, a processor 210, and a model tuning application 222") the processor is configured to: generate a plurality of models each having different parameters, ([¶0005] "cause the computing device to automatically select hyperparameter values based on objective criteria for training a predictive model. A plurality of tuning evaluation parameters that include a model type, a search method type, and values to evaluate for each hyperparameter of a plurality of hyperparameters associated with the model type are accessed [...] A plurality of hyperparameter configurations is determined using a search method of the search method type. A hyperparameter configuration includes a value for each hyperparameter of the plurality of hyperparameters. Each hyperparameter configuration of the plurality of hyperparameter configurations is unique. For each session of the plurality of sessions, a hyperparameter configuration is assigned to the session of the plurality of sessions, training of a model of the model type by the session computing devices allocated to the session is requested" Koch explicitly generates and trains models with a neural architecture search) wherein the parameters comprise weights ([¶0002] "a neural network type predictive model generates predicted outputs by transforming a set of inputs through a series of hidden layers that are defined by activation functions linked with weights") train each of the plurality of models to learn features of a part of predetermined learning data;([¶0005] "cause the computing device to automatically select hyperparameter values based on objective criteria for training a predictive model. A plurality of tuning evaluation parameters that include a model type, a search method type, and values to evaluate for each hyperparameter of a plurality of hyperparameters associated with the model type are accessed. A number of session computing devices allocated to each session of a plurality of sessions is determined. Each session computing device of the number of session computing devices processes a subset of an input dataset. [...] The model is trained using the assigned hyperparameter configuration and a training dataset that is a first portion of the input dataset") select one of the plurality of models based on model accuracy([¶0100] "The objective function specifies a measure of model error (performance) to be used to identify a best configuration of the hyperparameters among those evaluated" [¶0101] "F1 uses an F1 coefficient as the objective function [...] MCE uses a misclassification rate as the objective function" [¶0239] "Referring to FIG. 16, a final tuned model error—as averaged across ten tuning runs that used different validation partitions—for each problem and each modeling algorithm is shown for a suite of ten common machine learning test problems" [¶0240] "Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown" Koch explicitly discloses that the model tuning algorithm selects a best model from a plurality of models to improve (tune) a base models error/accuracy) train the selected model to learn features of the predetermined learning data.([¶0188] "a final hyperparameter configuration is selected based on the hyperparameter configuration that generated the best or lowest objective function value" [¶0191] "the selected session is requested to execute the final hyperparameter configuration based on the parameter values in the data structure. In an illustrative embodiment, a train request is sent to session manager device 400 of the selected session to execute the “train” action based on the selected model type. [...] Characteristics that define the trained model using the final hyperparameter configuration are provided back to the main thread on which selection manager application 312 is instantiated" Koch explicitly performs a post-selection "final model" training using the best configuration and stores that trained selected model for later use.) wherein the learning apparatus provides the trained selected model to control allocation of physical computing resources for executing prediction processes using the trained selected model, ([¶0164] "The objective function value for each hyperparameter configuration may be extracted from an in-memory table created by an action called for validation of the trained model" [¶0253] "In an operation 1906, a model is instantiated with information read from the model description. For example, the type of model, its hyperparameter values, and other characterizing elements are read and used to instantiate the model." [¶0257] "In an operation 1914, an output of the instantiated model is received. The output may indicate a predicted characteristic of the observation vector." [¶0263] "Hyperparameter selection system 100 supports better decision making by providing a system that can identify and evaluate many more hyperparameter configurations in parallel by allocating the computing devices of worker system 106 in an effective data and model parallel manner." The learning apparatus controls allocation of physical computing resources by loading the trained selected model and related objective function value from physical hardware for executing prediction processes). However, Koch does not explicitly teach wherein the parameters comprise weights and biases; wherein the allocation of physical computing resources reduces processing time for the prediction processes compared to unoptimized allocation.. Varadarajan, in the same field of endeavor, teaches wherein the parameters comprise weights and biases;([¶0133] "The artifact of a neural network may comprise matrices of weights and biases. Training a neural network may iteratively adjust the matrices of weights and biases.") wherein the allocation of physical computing resources reduces processing time for the prediction processes compared to unoptimized allocation.([¶0033] "cost-aware distributed scheduling is used to avoid an imbalanced workload across a cluster of computers, which improves horizontal scalability. When there are more tasks than computers, batching tasks are assigned to independent executors such that the average cost of each batch is close to the global average cost of all pending tasks" [¶0113] "If all the time-consuming tasks are scheduled on the same computer (as might happen with a cost-unaware scheduling algorithm), it may delay all these tasks further as there are only limited resources on each computer" Varadarajan explicitly performs distributed training in order to reduce processing time for the prediction processes). Koch as well as Varadarajan are directed towards machine learning model hyperparameter tuning. Therefore, Koch as well as Varadarajan are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Koch with the teachings of Varadarajan by performing the distributed training in Varadarajan as the training method for the final training in Koch. Varadarajan provides as additional motivation for combination ({¶0033] “cost-aware distributed scheduling is used to avoid an imbalanced workload across a cluster of computers, which improves horizontal scalability”). This motivation for combination also applies to the remaining claims which depend on this combination. Regarding claim 2, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 1 further comprising: generate a plurality of input values to be input to a predetermined first function that calculates a random number value based on the input value, and generates, for each of the generated input values,(Koch [¶0131] "a random seed value may be specified for each search method that may be the same for all search methods or may be defined separately for each search method") a plurality of models having parameters corresponding to the random number values output from the predetermined first function when the input values have been input.(Koch [¶0134] "the Random search method randomly generates hyperparameter values across the range of each hyperparameter and combines them across hyperparameters. If the Random search method is selected, a sample size value may be specified for all or for each hyperparameter that defines the number of hyperparameter configurations to evaluate in a single iteration."). Regarding claim 3, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 2, wherein the plurality of input values comprises input values with random number values that satisfy a predetermined condition.(Koch [¶0134] "the Random search method randomly generates hyperparameter values across the range of each hyperparameter and combines them across hyperparameters" [¶0135] "the LHS search method generates uniform hyperparameter values across the range of each hyperparameter and randomly combines them across hyperparameters. If the hyperparameter is continuous or discrete with more levels than a requested sample size, a uniform set of samples is taken across the hyperparameter range including a lower and an upper bound" Satisfying upper and lower bounds interpreted as predetermined condition). Regarding claim 4, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 3, wherein the plurality of input values comprises input values with random number value that fall within a predetermined range.(Koch [¶0134] "the Random search method randomly generates hyperparameter values across the range of each hyperparameter and combines them across hyperparameters"). Regarding claim 5, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 3, wherein the plurality of input values comprises input values with a distribution of associated random number values having a predetermined probability distribution.(Koch [¶0135] "the LHS search method generates uniform hyperparameter values across the range of each hyperparameter and randomly combines them across hyperparameters. If the hyperparameter is continuous or discrete with more levels than a requested sample size, a uniform set of samples is taken across the hyperparameter range including a lower and an upper bound" [¶0133] " the Grid search method generates uniform hyperparameter values across the range of each hyperparameter and combines them across hyperparameters" Koch explicitly states that the generated random number values have uniform probability distribution). Regarding claim 7, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 2, wherein a function in which the distribution of the random number values output when the input value has been input indicates a predetermined probability distribution is selected as the predetermined first function(Koch [¶0135] "the LHS search method generates uniform hyperparameter values across the range of each hyperparameter and randomly combines them across hyperparameters. If the hyperparameter is continuous or discrete with more levels than a requested sample size, a uniform set of samples is taken across the hyperparameter range including a lower and an upper bound" [¶0133] " the Grid search method generates uniform hyperparameter values across the range of each hyperparameter and combines them across hyperparameters" Koch explicitly states that the generated random number values have uniform probability distribution where uniform probability distribution is explicitly selected as the first function). Regarding claim 8, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 1, wherein the select the one of the plurality of models based on model accuracy comprises selecting a model that satisfies predetermined evaluation conditions from among the plurality of trained models(Koch [¶0137] "A tournament selection process may be used to randomly choose a group of members from the current population, compare their fitness, and select the fittest from the group to propagate to the next generation" [¶0138] "growth steps may be performed each iteration to permit selected hyperparameter configurations of the population (based on diversity and fitness) to benefit from local optimization over the continuous variables" [¶0240] "Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown" Examiner notes that both the training and tournament selection tuning process satisfy this claim). Regarding claim 9, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 8, wherein the predetermined evaluation conditions comprise a change in the evaluation value satisfies a predetermined mode.(Varadarajan [¶0100] "The objective function specifies a measure of model error (performance) to be used to identify a best configuration of the hyperparameters among those evaluated" [¶0101] "F1 uses an F1 coefficient as the objective function [...] MCE uses a misclassification rate as the objective function" [¶0055] "hyperparameter values 122 is processed by costing task 162 to calculate cost 132, which entails configuring target model 110 with hyperparameter values 122, training the configured model with a training dataset, testing the trained model for accuracy, and calculating cost 132 based on training duration and/or tested accuracy. In an embodiment, cost 132 is based on at least one of: time or memory spent training the ML model, time or memory spent testing the ML model, and/or accuracy indicated by the testing. Minimizing consumptive costs such as time or memory directly improve the performance of computer 100 itself. If computer 100 supports multiprocessing or is part of a computer cluster that does, then minimizing consumptive costs also improve the performance of the multiprocessing system itself." The predetermined evaluation condition is "the best" model configuration from among the plurality of trained models. Alternatively, selecting a best model trained to minimize cost is also interpreted as selecting a model that satisfies predetermined evaluation conditions from among the plurality of train models and based on model accuracy. First and final training modes are interpreted as predetermined modes.). Regarding claim 10, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 9, wherein model selection is based on the change in the evaluation value during iterative learning of the features of a part of the predetermined learning data a predetermined number of times satisfies the predetermined mode.(Koch [¶0137] "a maximum number of iterations may be specified where the population size defines the number of hyperparameter configurations to evaluate each iteration"). Regarding claim 11, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 8, wherein the predetermined evaluation conditions comprise a plurality of conditions designated by the user(Koch [¶0148] "As another option, the user can select a hyperparameter configuration included in the “Tuner Results” output table that is less complex, but provides a similar objective function value in comparison to the hyperparameters included in the “Best Configuration” output table"). Regarding claim 15, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 1, wherein the select one of the plurality of models based on model accuracy comprises selecting one of the models based on model accuracy for each model having different parameters(Koch [Abstract] "scoring of the trained model using a validation dataset and the assigned hyperparameter configuration is requested to compute an objective function value, and the received objective function value and the assigned hyperparameter configuration are stored. A best hyperparameter configuration is identified based on an extreme value of the stored objective function values" [¶0101] "F1 uses an F1 coefficient as the objective function [...] MCE uses a misclassification rate as the objective function" [¶0239] "Referring to FIG. 16, a final tuned model error—as averaged across ten tuning runs that used different validation partitions—for each problem and each modeling algorithm is shown for a suite of ten common machine learning test problems" [¶0240] "Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown" Koch explicitly and repeatedly evaluates many hyperparameter configurations (plurality of models having different parameters), selecting the best based on an extreme objective value (explicitly anticipated as being accuracy)). Regarding claim 16, claim 16 is directed towards the method performed by the apparatus of claim 1. Therefore, the rejection applied to claim 1 also applies to claim 16. Regarding claim 17, claim 17 is substantially similar to claim 1. Therefore, the rejection applied to claim 1 also applies to claim 17. Regarding claim 18, Koch teaches A learning apparatus comprising: generate a plurality of random number seeds and input the plurality of random number seeds into a predetermined function to generate a plurality of random number values that correspond to the plurality of random number seeds; ([¶0131] "a random seed value may be specified for each search method that may be the same for all search methods or may be defined separately for each search method" [¶0134] "the Random search method randomly generates hyperparameter values across the range of each hyperparameter and combines them across hyperparameters." Koch explicitly generates plural random seed values used for/as input for a search method that generates a plurality of random hyperparameter values) generate, for the plurality of random number seeds, a plurality of models with differing parameters based on the plurality of random number values;([¶0134] "the Random search method randomly generates hyperparameter values across the range of each hyperparameter and combines them across hyperparameters." Koch explicitly generates plural random seed values used for/as input for a search method that generates a plurality of random hyperparameter values, the hyperparameter values determining model configurations) train the plurality of models to learn features of a part of predetermined learning data; ([¶0005] "cause the computing device to automatically select hyperparameter values based on objective criteria for training a predictive model. A plurality of tuning evaluation parameters that include a model type, a search method type, and values to evaluate for each hyperparameter of a plurality of hyperparameters associated with the model type are accessed. A number of session computing devices allocated to each session of a plurality of sessions is determined. Each session computing device of the number of session computing devices processes a subset of an input dataset. [...] The model is trained using the assigned hyperparameter configuration and a training dataset that is a first portion of the input dataset") select one of the plurality of models based on model accuracy; ([¶0100] "The objective function specifies a measure of model error (performance) to be used to identify a best configuration of the hyperparameters among those evaluated" [¶0101] "F1 uses an F1 coefficient as the objective function [...] MCE uses a misclassification rate as the objective function" [¶0239] "Referring to FIG. 16, a final tuned model error—as averaged across ten tuning runs that used different validation partitions—for each problem and each modeling algorithm is shown for a suite of ten common machine learning test problems" [¶0240] "Referring to FIG. 17, a model improvement (error reduction or accuracy increase where higher is better) for the suite of ten common machine learning test problems illustrated in FIG. 16 is shown" Koch explicitly discloses that the model tuning algorithm selects a best model from a plurality of models to improve (tune) a base models error/accuracy) and train the selected model to learn features of the predetermined learning data, ([¶0188] "a final hyperparameter configuration is selected based on the hyperparameter configuration that generated the best or lowest objective function value" [¶0191] "the selected session is requested to execute the final hyperparameter configuration based on the parameter values in the data structure. In an illustrative embodiment, a train request is sent to session manager device 400 of the selected session to execute the “train” action based on the selected model type. [...] Characteristics that define the trained model using the final hyperparameter configuration are provided back to the main thread on which selection manager application 312 is instantiated" Koch explicitly performs a post-selection "final model" training using the best configuration and stores that trained selected model for later use.) wherein the plurality of random number values satisfies a predetermined condition.([¶0134] " the Random search method randomly generates hyperparameter values across the range of each hyperparameter and combines them across hyperparameters" [¶0135] "the hyperparameter range including a lower and an upper bound" The range of hyperparameter values interpreted as satisfying a predetermined condition (upper and lower bound)) and wherein the learning apparatus provides the trained selected model to control allocation of physical computing resources for executing prediction processes using the trained selected model, ([¶0164] "The objective function value for each hyperparameter configuration may be extracted from an in-memory table created by an action called for validation of the trained model" [¶0253] "In an operation 1906, a model is instantiated with information read from the model description. For example, the type of model, its hyperparameter values, and other characterizing elements are read and used to instantiate the model." [¶0257] "In an operation 1914, an output of the instantiated model is received. The output may indicate a predicted characteristic of the observation vector." [¶0263] "Hyperparameter selection system 100 supports better decision making by providing a system that can identify and evaluate many more hyperparameter configurations in parallel by allocating the computing devices of worker system 106 in an effective data and model parallel manner." The learning apparatus controls allocation of physical computing resources by loading the trained selected model and related objective function value from physical hardware for executing prediction processes). However, Koch does not explicitly teach wherein the allocation of physical computing resources reduces processing time for the prediction processes compared to unoptimized allocation. Varadarajan, in the same field of endeavor, teaches wherein the allocation of physical computing resources reduces processing time for the prediction processes compared to unoptimized allocation([¶0033] "cost-aware distributed scheduling is used to avoid an imbalanced workload across a cluster of computers, which improves horizontal scalability. When there are more tasks than computers, batching tasks are assigned to independent executors such that the average cost of each batch is close to the global average cost of all pending tasks" [¶0113] "If all the time-consuming tasks are scheduled on the same computer (as might happen with a cost-unaware scheduling algorithm), it may delay all these tasks further as there are only limited resources on each computer" Varadarajan explicitly performs distributed training in order to reduce processing time for the prediction processes). Koch as well as Varadarajan are directed towards machine learning model hyperparameter tuning. Therefore, Koch as well as Varadarajan are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Koch with the teachings of Varadarajan by performing the distributed training in Varadarajan as the training method for the final training in Koch. Varadarajan provides as additional motivation for combination ({¶0033] “cost-aware distributed scheduling is used to avoid an imbalanced workload across a cluster of computers, which improves horizontal scalability”). This motivation for combination also applies to the remaining claims which depend on this combination. Claim 6 is rejected under U.S.C. §103 as being unpatentable over the combination of Koch and Varadarajan and in further view of Detwiler (US20170329875A1). Regarding claim 6, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 3. However, the combination of Koch and Varadarajan doesn't explicitly teach wherein the plurality of input values comprises input values with a mean value of associated random number values meeting a predetermined value. Detwiler, in the same field of endeavor, teaches The learning apparatus according to claim 3, wherein the plurality of input values comprises input values with a mean value of associated random number values meeting a predetermined value. ([0157] "Here, N(0,1) is a random number according to a normal distribution with mean zero and expectation one"). The combination of Koch and Varadarajan as well as Detwiler are directed towards machine learning systems with random number generation. Therefore, The combination of Koch and Varadarajan as well as Detwiler are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of The combination of Koch and Varadarajan with the teachings of Detwiler by substituting a normal distribution for the uniformly distributed random number generator used in Koch. While it would have been obvious to one of ordinary skill in the art that the uniform distribution used in Koch could have a mean of zero and that a standard normal distribution has a mean of zero, Detwiler explicitly teaches that using a normal distribution with a mean of zero over the same exact range used in Koch in an analogous evolutionary algorithm is known and would lead to predictable results. Claims 12, 13, and 14 are rejected under U.S.C. §103 as being unpatentable over the combination of Koch and Varadarajan and in further view of Parnell (US 20200184369 A1). Regarding claim 12, the combination of Koch and Varadarajan teaches The learning apparatus according to claim 1. However, the combination of Koch and Varadarajan doesn't explicitly teach wherein the processor is further configured to generate a plurality of input values to be input to a predetermined second function that calculates a random number value for each input value; generate, for each of the plurality of input values, a part of the predetermined learning data based on corresponding random number value output by the predetermined second function, wherein the part of the predetermined learning data is used to train each of the plurality of models. Parnell, in the same field of endeavor, teaches the processor is further configured to generate a plurality of input values to be input to a predetermined second function that calculates a random number value for each input value; generate, for each of the plurality of input values, a part of the predetermined learning data based on corresponding random number value output by the predetermined second function, wherein the part of the predetermined learning data is used to train each of the plurality of models. ([¶0007] "For successive batches of the training data, defined by respective subsets of one of the row coordinates and column coordinates, the method includes generating, in the host computer, random numbers associated with respective coordinates in a current batch b and sending the random numbers to the accelerator unit. In parallel with generating the random numbers for batch b, batch b is copied from the host computer to the accelerator unit. The method further comprises, in the accelerator unit and in parallel with the copying of batch b, sorting the random numbers for coordinates in the previous batch (b−1) to randomly permute the coordinates and performing the stochastic optimization process for the permuted coordinates in batch (b−1) to update the model vector w in dependence on coordinates in that batch."). The combination of Koch and Varadarajan as well as Parnell are directed towards machine learning training. Therefore, the combination of Koch and Varadarajan as well as Parnell are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Koch and Varadarajan with the teachings of Parnell by using a random number generator for the training dataset order. Parnell provides as additional motivation for combination ([¶0008] “The random numbers can be generated efficiently on the host computer and then sent to the accelerator ready for processing of the next batch of data. The random numbers are then sorted on the accelerator, where the sorting operation can be performed with high efficiency, to obtain the required coordinate permutation. Performing the tasks in parallel in this way provides more effective use of system resources, offering significant improvement in efficiency of machine learning operations.”). This motivation for combination also applies to the remaining claims which depend on this combination. Regarding claim 13, the combination of Koch, Varadarajan, and Parnell teaches The learning apparatus according to claim 12, wherein input value generation is iteratively performed to generate learning data as a learning target in the learning for iterative model training(Parnell [¶0007] "For successive batches of the training data, defined by respective subsets of one of the row coordinates and column coordinates, the method includes generating, in the host computer, random numbers associated with respective coordinates in a current batch b and sending the random numbers to the accelerator unit. In parallel with generating the random numbers for batch b, batch b is copied from the host computer to the accelerator unit. The method further comprises, in the accelerator unit and in parallel with the copying of batch b, sorting the random numbers for coordinates in the previous batch (b−1) to randomly permute the coordinates and performing the stochastic optimization process for the permuted coordinates in batch (b−1) to update the model vector w in dependence on coordinates in that batch." Batch interpreted as a time of repeated learning.). Regarding claim 14, the combination of Koch, Varadarajan, and Parnell teaches The learning apparatus according to claim 12, further comprising: generating, as a part of the predetermined learning data, learning data in which the random number values are associated as a learning order.(Parnell [¶0007] "For successive batches of the training data, defined by respective subsets of one of the row coordinates and column coordinates, the method includes generating, in the host computer, random numbers associated with respective coordinates in a current batch b and sending the random numbers to the accelerator unit. In parallel with generating the random numbers for batch b, batch b is copied from the host computer to the accelerator unit. The method further comprises, in the accelerator unit and in parallel with the copying of batch b, sorting the random numbers for coordinates in the previous batch (b−1) to randomly permute the coordinates and performing the stochastic optimization process for the permuted coordinates in batch (b−1) to update the model vector w in dependence on coordinates in that batch."). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Stefani (US20190164080A1) is directed towards a resource aware machine learning hyper-parameter tuning system. Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124
Read full office action

Prosecution Timeline

Sep 09, 2021
Application Filed
Nov 23, 2024
Non-Final Rejection — §101, §103, §112
May 27, 2025
Response Filed
Jun 30, 2025
Final Rejection — §101, §103, §112
Dec 03, 2025
Response after Non-Final Action
Jan 05, 2026
Request for Continued Examination
Jan 07, 2026
Response after Non-Final Action
Feb 10, 2026
Non-Final Rejection — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
2y 5m to grant Granted Feb 24, 2026
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 5m to grant Granted Feb 10, 2026
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
2y 5m to grant Granted Jan 27, 2026
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
2y 5m to grant Granted Jan 06, 2026
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month