Last updated: May 29, 2026

Application No. 17/663,430

Machine Learning Hyperparameter Tuning

Non-Final OA §103

Filed

May 15, 2022

Priority

May 17, 2021 — provisional 63/189,496

Examiner

DASGUPTA, SHOURJO

Art Unit

2144

Tech Center

2100 — Computer Architecture & Software

Assignee

Google LLC

OA Round

3 (Non-Final)

Interview Optional

— +38.6% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 65% grant rate with +38.6% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 454 resolved cases, 2023–2026

Examiner Intelligence

DASGUPTA, SHOURJO View full profile →

Grants 65% — above average

Career Allowance Rate

297 granted / 454 resolved

+10.4% vs TC avg

Strong +39% interview lift

Without

With

+38.6%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

17 currently pending

Career history

486

Total Applications

across all art units

Statute-Specific Performance

§101

2.3%

-37.7% vs TC avg

§103

91.8%

+51.8% vs TC avg

§102

2.5%

-37.5% vs TC avg

§112

2.8%

-37.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 454 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office Action has been withdrawn pursuant to 37 CFR 1.114.  

Detailed Action
3.	This Non-Final Office Action is responsive to Applicants’ RCE submission as received 1/29/26.  Claims 1-20 were pending, and by way of the recent submission claims 4 and 15 are now cancelled.  Hence, claims 1-3, 5-14, and 16-22 are presently pending, of which claims 1 and 12 are independent.

Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office Action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

6.	Claims 1-2, 6-7, 9-13, 17-18, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2018/0240041 (“Koch”) in view of Non-Patent Literature “Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads” (“Narayanan”) and further in view of Non-Patent Literature “Hyperopt: a Python library for model selection and hyperparameter optimization” (“Bergstra”, previously made of record via the Final Office Action dated 11/14/25).
Regarding claim 1, KOCH teaches A computer-implemented method (FIG. 3 teaching a block diagram for a selection manager device, which features modules 314-322 that are used to perform data processing relating to the automatic selection of hyperparameters to train a predictive model (as summarized per the Abstract and as shown in more detail per FIGs. 5 and especially 6A-6C for example), where the selection manager device of FIG. 3 includes a processor (FIG. 3’s element 310)) comprising:
receiving, by data processing hardware (FIG. 3 element 310) and from a user device (a requesting user and corresponding device for that user, per FIG. 2 and [0033]), a hyperparameter optimization request requesting optimization of one or more hyperparameters of a machine learning model (the aforementioned requesting user and corresponding device issues the request of FIG. 5 step 528, which the Examiner equates with the recited “hyperparameter optimization request” which is understood to relate to a model to be tuned accordingly (per FIG. 5 and [0067] for example));
obtaining, by the data processing hardware (FIG. 3 element 310), training data for training the machine learning model (FIG. 5 step 506, where a user provides indication of an “input dataset”, which the Examiner understands to be a basis for training data selection ([0061], [0160]) in accordance with training and tuning a model as previously discussed just above and to be further discussed just below);
determining, by the data processing hardware (FIG. 3 element 310), a set of hyperparameter permutations of the one or more hyperparameters of the machine learning model ... and for each respective hyperparameter permutation in the set of hyperparameter permutations:
training, by the data processing hardware (FIG. 3 element 310) ... , a unique machine learning model using the training data and the respective hyperparameter permutation; and determining, by the data processing hardware (FIG. 3 element 310), a performance of the trained unique machine learning model ([0152] discussing the determination of a configuration list of hyperparameter configurations to be evaluated (i.e., akin to the recited “set of hyperparameter permutations”), which are selected for a particular model, where the hyperparameter configurations are iteratively selected and assigned to a session per FIG. 6B and [0169] (i.e., corresponding to the recitations for “training” and “determining a performance of ... model” that is required “for each respective hyperparameter permutation”), such that the model is trained and scored with respect to each hyperparameter configuration ([0171]), where at the end of the iterative evaluation as described there is a result (FIG. 6C step 672, which clarifies FIG. 5 steps 530-532 and serves as a basis for a hyperparameter selection for further model training/evaluation per FIG. 5 step 534));
selecting, by the data processing hardware (FIG. 3 element 310) and based on the performance of each of the trained unique machine learning models, one of the trained unique machine learning models and generating, by the data processing hardware (FIG. 3 element 310), one or more predictions using the selected one of the trained unique machine learning models (FIG. 5’s step 532 is a hyperparameter selection, e.g. based on its performance for training a particular model with a particular training dataset, and then is used for a new dataset per step 534 (which the Examiner equates with the prediction generation as recited)).

Koch does not teach the further limitation of performing the training as discussed above based on a priority order of the machine learning model.  Rather, the Examiner relies upon NARAYANAN to teach what Koch otherwise lacks, see e.g., Narayanan’s framework having a scheduling policy for deep learning (Abstract and Introduction sections, per page 481), as depicted generally via Figure 2 on page 483, where the system manages the training of different jobs (section 3 on page 483), e.g. using a scheduling mechanism as discussed per section 3.2 on page 485 that involves an explicit “priority score for every job” to facilitate the scheduling in accordance with a priority order for the jobs.
Both Koch and Narayanan relate to deep learning frameworks and the resource management thereof.  Hence, they are similarly directed and therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate a job priority for a deep learning task, per Narayanan, into a multi-task/job framework as contemplated by either/both reference, such that the available compute resources can be strategically applied to satisfy the needs of not just one deep learning task/job but many, thereby improving efficiency and throughput for a compute paradigm that is often resource constrained and benefits from being smartly resource aware.

By way of the RCE’s amendments, the limitation discussed above for determining a set of hyperparameter permutations of the one or more hyperparameters of the machine learning model is further clarified by way of further limitations to be performed by at least: 
identifying at least one previously trained machine learning model that shares a hyperparameter with the machine learning model and 
searching the defined search space, 
which the Examiner respectfully submits that Koch teaches.  
Regarding the identifying limitation, see Koch: [0098], clarifying FIG. 5’ step 518, such that a hyperparameter configuration to be evaluated / used for training may be permitted or subject to discarding or revision (i.e., an identifying step) based on a similarity tolerance to a prior configuration, where this comparison is performed on a per-hyperparameter basis, and hence from this the Examiner understands that across the configurations to be evaluated, the hyperparameters included therein may be same/similar and hence subject to a reuse condition (which the Examiner equates to the recited shared concept)).
Regarding the searching limitation, see Koch: [0167]: “For illustration, the LHS, the Random, and/or the Grid search methods may be used in a first iteration to define the first set of hyperparameter configurations that sample the search space.”

Further by way of the RCE’s amendments, the limitation discussed above for determining a set of hyperparameter permutations of the one or more hyperparameters of the machine learning model is even further clarified by way of further limitations to be performed by at least: 
utilizing a value of the hyperparameter from the previously trained machine learning model to configure an initial probability distribution of a search model defined over a search space, 
which the Examiner does not think either Koch or Narayanan sufficiently teach.  Rather, for this additional limitation, the Examiner relies upon BERGSTRA to teach what Koch etc. otherwise lack, see e.g. Bergstra:
Numbered page 2, column 1, lines 21-34 discussing an infrastructure for carrying out hyperparameter optimization for machine learning algorithms via an optimization interface that allows the definition of a configuration space as a probability distribution, which allows experts to better tune and improve aspects of the HPO configuration search.  Further down, in lines 42-47, a benefit of such a practice is established, e.g. to make the benchmarking more legible and reusable for other times and other people.  More clarification of the configuration space being defined for improved search performance is provided on numbered page 3, column 1, lines 33-45.
From Bergstra, the Examiner understands that there is a recognized and practiced advantage in the state of the art to parameterize the search space involved in hyperparameter optimization such that the parameterization of it, including that of it as a probability distribution, is not only intentionally initialized but is done so based on prior efforts so as to provide the benefit for later work/performance/use.  
But see also the references discussed below in the Conclusion section of this Office Action which essentially teach the same.  In the totality of the prior art obtained through this most recent search update, the Examiner does not believe the amended limitations are sufficient to indicate allowability.  Rather, they are known and used in the state of the prior art.
As with Koch and Narayanan, Bergstra similarly contemplates hyperparameter optimization challenges and practices.  Hence, it is similarly directed and therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Bergstra’s teachings as discussed here into a framework such as Koch’s, with a reasonable expectation of success, such as to realize the benefits and advantages expressed by Bergstra as noted by the Examiner here in this present discussion.

Regarding claim 2, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein determining the set of hyperparameter permutations comprises performing a search on a hyperparameter search space of the one or more hyperparameters of the machine learning model (Koch: [0167]: “For illustration, the LHS, the Random, and/or the Grid search methods may be used in a first iteration to define the first set of hyperparameter configurations that sample the search space”, which clarifies [0167]’s prior mention of “tuning search method to determine a set of hyperparameters that are combined to define the first set of hyperparameter configurations”).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 5, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein the at least previously trained machine learning model is trained for a user of the user device (per the mappings provided above for claim 4, the prior iterations of training and hyperparameter model evaluation are all associated with the common user involved with Koch’s FIG. 5 teachings).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 6, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein training the unique machine learning model comprises training two or more unique machine learning models in parallel (Koch: [0067], [0070], [0157], and [0231] teaching concurrency and parallel execution advantages in facilitating the steps of FIGs. 5 and 6A-6C, such that [0150]’s workers and sessions can be understood to be working in parallel).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 7, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein providing the performance of each of the trained unique machine learning models to the user device comprises providing, to the user device, an indication indicating which trained unique machine learning model has the best performance based on the training data (Koch: [0073]: “one or more of the output tables may be selected by the user for presentation on display”, as referring to the different output tables discussed therein same paragraph, and also [0147] providing further clarification of tuning evaluation results as subject to a display/presentation, e.g. per FIG. 5’s step 530).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 9, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein the hyperparameter optimization request comprises a budget and a size of the set of hyperparameter permutations of the one or more hyperparameters of the machine learning model is based on the budget (Koch: [0156] discussing a user and/or administrator’s capability to adjust a size constrain for evaluation per FIG. 6 step 604).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 10, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein the data processing hardware is part of a distributed computing database system (Koch: [0066]: “... the input dataset may be stored in a cube distributed across the computing devices of each session that is a grid of computers as understood by a person of skill in the art ...”, e.g. to facilitate the management and use of the many worker computers shown per FIG. 1 and described per [0030] and [0036]).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 11, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein selecting the one of the trained unique machine learning models comprises: transmitting the performance of each of the trained unique machine learning models to the user device, and receiving, from the user device, a trained unique machine learning model selection selecting the one of the trained unique machine learning models (Koch: FIG. 5’s step 530 and 532 respectively).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 12, the claim includes the same or similar limitations as claim 1 discussed above and is therefore rejected under the same rationale.

Regarding claim 13, the claim includes the same or similar limitations as claim 2 discussed above and is therefore rejected under the same rationale.

Regarding claim 16, the claim includes the same or similar limitations as claim 5 discussed above and is therefore rejected under the same rationale.

Regarding claim 17, the claim includes the same or similar limitations as claim 6 discussed above and is therefore rejected under the same rationale.

Regarding claim 18, the claim includes the same or similar limitations as claim 7 discussed above and is therefore rejected under the same rationale.

Regarding claim 20, the claim includes the same or similar limitations as claim 9 discussed above and is therefore rejected under the same rationale.

Regarding claim 21, the claim includes the same or similar limitations as claim 10 discussed above and is therefore rejected under the same rationale.

Regarding claim 22, the claim includes the same or similar limitations as claim 11 discussed above and is therefore rejected under the same rationale.


7.	Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Koch in view of Naryanan and Bergstra and further in view of Non-Patent Literature “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization” (“Li”).
Regarding claim 3, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references teach a hyperparameter search and corresponding space, as discussed per claim 2 (citing Koch’s [0167]).  Further, the search may be Gaussian as discussed per Koch’s [0139] (“For illustration, the Bayesian search method is based on creating and exploring a kriging surrogate model to search for improved solutions. A Kriging model is a type of interpolation algorithm for which the interpolated values are modeled by a Gaussian process governed by prior covariance values.”).  Hence, Koch alone may possibly teach the entirety of the further limitation wherein performing the search on the hyperparameter search space comprises performing the search using a batched Gaussian process bandit optimization.  However, to the extent that Koch does not sufficiently teach “a batched Gaussian process bandit optimization” for the search, as recited, the Examiner then relies upon LI to teach what Koch etc. otherwise lacks, see e.g., Li’s section 2.1 beginning on page 3, discussing Gaussian processes to model and sample hyperparameters in pursuit of hyperparameter optimization (citing to Spearmint) and more explicitly (on page 4’s second full paragraph) that “Gaussian processes have also been studied in the bandit setting using confidence bound acquisition functions”, where Li’s Hyperband framework itself expands upon the sampling aspect mentioned just above per Spearmint with a batched approach (section 3.3, beginning on page 9, but see also the bulleted Data Set Subsampling discussion on page 10).
Like Koch, Li is directed to hyperparameter optimization, and is therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Li’s sampling contributions to hyperparameter search and optimization aspects into a framework such as Koch’s, with a reasonable expectation of success, such as to improve speed and efficiency as discussed per Li’s Abstract and page 2’s first full paragraph.

Regarding claim 14, the claim includes the same or similar limitations as claim 3 discussed above and is therefore rejected under the same rationale.


8.	Claims 8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Koch in view of Narayanan and Bergstra and further in view of U.S. Patent Application Publication No. 2019/0073570 (“Turco”).
Regarding claim 8, Koch in view of Narayanan and further in view of Bergstra teach the method of claim 1, as discussed above.  The aforementioned references, e.g. per Koch’s [0066], teach the use and management of the input dataset, which might be stored using “a structured query language database.”  Hence, Koch etc. very clearly teach a SQL database as linked to the input dataset, where the input dataset is part of the user’s request formulation per FIG. 5 step 506.  That said, it is not entirely clear whether the request itself (Koch’s FIG. 5 step 528) includes or comprises a query to the database, e.g. per the further limitation wherein the hyperparameter optimization request comprises a SQL query.  Rather, the Examiner relies upon TURCO to teach what Koch etc. otherwise lack, see e.g. Turco’s [0101] (“... Performing the predictions in the described manner in a database contexts may provide a huge performance gain compared to other systems, because the creation of tables for thousands of models which may never actually be used by any client is avoided and because in some embodiments the received input data is stored in a structured manner in temporary input tables such that fast, specially adapted analytical SQL routines 174 can be applied on the input data without having to export the data to a higher-level application program.”) and [0111] (“... the model manager module forwards the model M14 to the predictor module 160 which performs a prediction on the input data 124, thereby using the model M14 and optionally a stored SQL procedure ... ”).
Like Koch, Turco is directed to efficient and optimal practices relating to machine learning models, and is therefore analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Turco’s SQL procedure approach, e.g. to facilitate access and management of input data, into a framework such as Koch’s, with a reasonable expectation of success, such as to simplify and improve memory/data management for the input data as Turco contemplates.

Regarding claim 19, the claim includes the same or similar limitations as claim 8 discussed above and is therefore rejected under the same rationale.


Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure:
Non-Patent Literature “On Warm-Starting Neural Network Training” (Ash)
Non-Patent Literature “Introduction to Automatic Hyperparameter Optimization with Hyperopt” (Sukumar), discussing the provision of probability distribution of hyperparameter values for Random Search (3rd page’s second bullet point), discussing Hyperopt specifically defining a configuration space via a probability distribution (5th-6th pages, specifically item 2 under the Hyperopt heading)
Non-Patent Literature “Random Search for Hyper-Parameter Optimization” (Bergstra), especially section 2.4 on page 290 discussing the description of a hyperparameter configuration space in terms of distribution.
Non-Patent Literature “Scalable Gaussian process-based transfer surrogates for hyperparameter optimization” (Wistuba), especially section 2.1 on pages 46-47 discussing the utility of defining probability distributions for a hyperparameter space subject to search in an HPO framework.
Non-Patent Literature “Similarity Transfer for Knowledge Distillation” (Zhao)

10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOURJO DASGUPTA whose telephone number is (571) 272-7207. The examiner can normally be reached M-F 8am-5pm CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571 272 4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHOURJO DASGUPTA/Primary Examiner, Art Unit 2144

Read full office action

Prosecution Timeline

Show 6 earlier events

Nov 14, 2025

Final Rejection mailed — §103

Jan 29, 2026

Request for Continued Examination

Feb 08, 2026

Response after Non-Final Action

Feb 20, 2026

Non-Final Rejection mailed — §103

Apr 10, 2026

Interview Requested

Apr 16, 2026

Applicant Interview (Telephonic)

Apr 16, 2026

Examiner Interview Summary

May 08, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

17/562,124

Patent 12626174

METHOD OF DRIVING A QUANTUM COMPUTER TO FIND ONE OR MORE STATES OF INTEREST OF A NETWORK

4y 4m to grant Granted May 12, 2026

17/270,853

Patent 12614058

ARCHITECTURE OF A COMPUTER FOR CALCULATING A CONVOLUTION LAYER IN A CONVOLUTIONAL NEURAL NETWORK

5y 2m to grant Granted Apr 28, 2026

17/479,547

Patent 12608535

AUTOMATED DIGITAL TEXT OPTIMIZATION AND MODIFICATION

4y 7m to grant Granted Apr 21, 2026

17/491,240

Patent 12591802

GENERATING ESTIMATES BY COMBINING UNSUPERVISED AND SUPERVISED MACHINE LEARNING

4y 6m to grant Granted Mar 31, 2026

17/342,719

Patent 12586371

SENSOR DATA PROCESSING

4y 9m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

65%

Grant Probability

99%

With Interview (+38.6%)

3y 5m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 454 resolved cases by this examiner. Grant probability derived from career allowance rate.