Last updated: May 29, 2026
Application No. 17/470,763
MULTI-OBJECTIVE MACHINE LEARNING WITH MODEL AND HYPERPARAMETER OPTIMIZATION FUSION

Non-Final OA §103
Filed
Sep 09, 2021
Examiner
VANWORMER, SKYLAR K
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
3 (Non-Final)
This examiner grants 39% of cases after interview

— +22.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 28 resolved cases, 2023–2026
Examiner Intelligence

VANWORMER, SKYLAR K View full profile →
Grants only 39% of cases
Career Allowance Rate
11 granted / 28 resolved
-15.7% vs TC avg
Strong +22% interview lift
Without
With
+22.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.7%
-37.3% vs TC avg
§103
96.6%
+56.6% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 28 resolved cases
Office Action

§103
DETAILED ACTION
Claims 1, 3-15, 17-18 and 20-23 are pending.
Claims 1, 15 and 18 are independent.
Claims 2, 16 and 19 are cancelled.
Claims 1, 14-15 and 18 are amended.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/16/2025 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claims 1, 3-15, 17-18 and 20-23 have been considered. However, newly used prior art Sobol et al (US 20190209022, “Sobol”) has been mapped to teach the amended features. Examiner respectfully directs Applicant to the detailed rejection for an explanation of how the references disclose the argued limitations.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1,3, 4, 7-10, 12-15, 17, 18, and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Loni et al (DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems, "Loni"), in view of Choi et al (Heuristic Approach for Selecting Best-Subset Including Ranking Within the Subset, "Choi") (2020) and Gustavsson et al (A New Algorithm Using the Non-dominated Tree to improve Non-dominated Sorting, "Gustavsson") (2020) and Sobol et al (US 20190209022, “Sobol”).

In regard to claim 1, Loni teaches at least one processing device comprising a processor coupled to a memory, the at least one processing device, when executing program code, is configured to: (Loni, pg. 9, Col. 1, paragraph 2, “In general, Keras uses the TensorFlow backend for training models and mapping on the hardware platform. TensorFlow supports a wide range of hardware platforms from x86 CPU processors [the at least one processing device] to ARM-based platforms.” And paragraph 3, “Unlike CPUs, we do need an initialization phase to copy data to GPU/FPGA’s internal memory [a processor coupled to a memory], before lunching processing kernel. Usually, kernel time is used for reporting runtime results; however, considering the communication time is vital for embedded implementations, especially for mission-critical applications since these applications are mainly latency oriented.”)
perform one of a plurality of hyperparameter optimization operations and a plurality of model parameter optimization operations based at least in part on the one or more algorithmic hyperparameters, the one or more model hyperparameters, the one or more time constraints, the one or more hyperparameter constraints, and the one or more computational load constraints to generate a first solution set; (Loni, pg. 5, Step 1, “1. After generating a random initial parent population P t with size N, DeepMaker generates a network model based on the hyperparameters of each genome in the parent population. Then DeepMaker trains each model [plurality of model parameter optimization operations] to calculate the network accuracy and network size [the one or more hyperparameter constraints] for all the models. 2. The offspring populating U t will be created by using GP, including crossover and mutation steps.” (the combination of Ut and Pt is interpreted as the claimed first solution set) and pg. 5, Col. 1, paragraph 1, “DeepMaker is equipped with the fast and multi-objective GP, NSGA-II, to discover a near-optimal set of hyperparameters considering both the accuracy and the network size as the objectives. Total trainable network weights [one or more algorithmic hyperparameters] are defined as the network size [the one or more model hyperparameters] objective since the performance and energy efficiency [the one or more computational load constraints] of the backend accelerator highly rely on inner product operations, which are execution bottleneck of DNNs [9] [based at least in part on the one or more algorithmic hyperparameters, the one or more model hyperparameters, the one or more time constraints, the one or more hyperparameter constraints, and the one or more computational load constraints].” And pg. 1, Col. 2, paragraph 2, “However, optimizing the network architecture at design time [the one or more time constraints] should be taken into account as the third approach since the choice of the architecture strongly impacts on both the performance and the output quality of DNNs.”) 
generate a first non-dominated solution set based on the first solution set; (Loni, pg. 4, Fig. 2, 
    PNG
    media_image1.png
    392
    662
    media_image1.png
    Greyscale
, Examiner would like to point out that in figure 2 description, that a set of non-dominate solutions are selected (A, C, D and F.)
perform the other of the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations on the selected subset of the first solution set based at least in part on the one or more algorithmic hyperparameters, the one or more model hyperparameters, the one or more time constraints, the one or more hyperparameter constraints, and the one or more computational load constraints to generate a second solution set comprising one or more second non-dominated solution sets; and (Loni, pg. 5, Step 3, “3. The NSGA-II sorts the combination of U t and P t to find the next generation parent population of N acceptable individuals [second solution set comprising one or more second non-dominated solution sets] that cannot dominate each other in terms of accuracy and network size.” (NSGA-II performs non-dominated sorting which is interpreted as per para 30 of Applicant’s spec as the claimed plurality of hyperparameter optimization operations and pg. 5, Col. 1, paragraph 1, “DeepMaker is equipped with the fast and multi-objective GP, NSGA-II, to discover a near-optimal set of hyperparameters considering both the accuracy and the network size as the objectives. Total trainable network weights [one or more algorithmic hyperparameters] are defined as the network size [the one or more model hyperparameters] objective since the performance and energy efficiency [the one or more computational load constraints] of the backend accelerator highly rely on inner product operations, which are execution bottleneck of DNNs [9] [based at least in part on the one or more algorithmic hyperparameters, the one or more model hyperparameters, the one or more time constraints, the one or more hyperparameter constraints, and the one or more computational load constraints].” And pg. 1, Col. 2, paragraph 2, “However, optimizing the network architecture at design time [the one or more time constraints] should be taken into account as the third approach since the choice of the architecture strongly impacts on both the performance and the output quality of DNNs.”) 
However, Loni does not explicitly teach receive through a user interface one or more algorithmic hyperparameters and one or more model hyperparameters of one or more machine learning models, one or more time constraints, one or more hyperparameter constraints, and one or more computational load constraints for generating a fused non-dominated solution set:
select a subset of the first solution set;
perform fusion processing at least a portion of the first non-dominated solution set and at least a portion of the one or more second non-dominated solution sets to generate a third solution set comprising the fused non-dominated solution set.
the fused non-dominated solution set being generated based on the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations and comprising a plurality of non- dominated data points multiple model parameters for each hyperparameter configuration for multiple objectives of the one or more machine learning models;
modify at least one of the one or more algorithmic hyperparameters and the one or more model hyperparameters of the one or more machine learning models based on the fused non- dominated solution set; and
send the one or more modified hyperparameters of the one or more machine learning models to the user interface.
Choi teaches select a subset of the first solution set; (Choi, pg. 3861, Conclusion, “To efficiently optimize the performance of a complex system with stochastic simulation, we newly defined an R&S problem for selecting the best-subset [select a subset of the first solution set], including ranking within the subset from a finite set of alternatives. To maximize the accuracy of the selection under limited simulation resources,…”)
Loni and Choi are related to the same field of endeavor (i.e. optimization). In view of the teachings of Choi, it would have been obvious for a person with ordinary skill in the art to apply the teachings of Choi to Loni before the effective filing date of the claimed invention in order to maximize accuracy. (Choi, pg. 3861, Conclusion, “To maximize the accuracy of the selection under limited simulation resources,…”)
However, Loni and Choi do not explicitly teach receive through a user interface one or more algorithmic hyperparameters and one or more model hyperparameters of one or more machine learning models, one or more time constraints, one or more hyperparameter constraints, and one or more computational load constraints for generating a fused non-dominated solution set:
perform fusion processing at least a portion of the first non-dominated solution set and at least a portion of the one or more second non-dominated solution sets to generate a third solution set comprising the fused non-dominated solution set.
the fused non-dominated solution set being generated based on the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations and comprising a plurality of non- dominated data points multiple model parameters for each hyperparameter configuration for multiple objectives of the one or more machine learning models;
modify at least one of the one or more algorithmic hyperparameters and the one or more model hyperparameters of the one or more machine learning models based on the fused non- dominated solution set; and
send the one or more modified hyperparameters of the one or more machine learning models to the user interface.
Gustavsson teaches perform fusion processing at least a portion of the first non-dominated solution set and at least a portion of the one or more second non-dominated solution sets to generate a third solution set comprising the fused non-dominated solution set. (Gustavsson, pg. 23, paragraph 1, “The population size starts from 100 and doubles until it reaches 3200. The population size in these experiments refers to the population size for the NSGA-II algorithm. NSGA-II performs all non-dominated sorts on two combined populations, which means that the non-dominated sorts are performed using a doubled population size [generate a third solution set comprising a fused non-dominated solution set.]. All experiments have been conducted using three to eight objectives and the results with three and eight objectives are shown in Figure 13, Table 5 and Table 6. For the sake of visibility only the best performing strategy for the ENS algorithm is shown in the log-log plots. To make the tables easier to read only population sizes 200, 800 and 3200 are shown.”)
Loni, Choi and Gustavsson are related to the same field of endeavor (i.e. optimization). In view of the teachings of Gustavsson, it would have been obvious for a person with ordinary skill in the art to apply the teachings of Gustavsson to Loni and Choi before the effective filing date of the claimed invention in order to handle larger populations more efficiently. (Gustavsson, abstract, “ENS-NDT is able to handle large population sizes and a large number of objectives more efficiently than existing algorithms for non-dominated sorting.”)
However, Loni, Choi and Gustavsson do not explicitly teach receive through a user interface one or more algorithmic hyperparameters and one or more model hyperparameters of one or more machine learning models, one or more time constraints, one or more hyperparameter constraints, and one or more computational load constraints for generating a fused non-dominated solution set:
the fused non-dominated solution set being generated based on the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations and comprising a plurality of non- dominated data points multiple model parameters for each hyperparameter configuration for multiple objectives of the one or more machine learning models;
modify at least one of the one or more algorithmic hyperparameters and the one or more model hyperparameters of the one or more machine learning models based on the fused non- dominated solution set; and
send the one or more modified hyperparameters of the one or more machine learning models to the user interface.
Sobol teaches receive through a user interface one or more algorithmic hyperparameters and one or more model hyperparameters of one or more machine learning models, one or more time constraints, one or more hyperparameter constraints, and one or more computational load constraints for generating a fused non-dominated solution set: (Sobol, paragraph 0105, “FIG. l0D depicts in bar chart form a notional dashboard that can be displayed to a caregiver on the remote computing device of FIG. 9 to identify the amount of time that a particular patient spends in various rooms over the course of a week and that is based on LEAP data that is generated by the wearable electronic device and system of FIG. 1 according to one or more embodiments shown or described herein;” and paragraph 0164, “As such, the sensors 121 are non-invasive in that they need not be ingested or in percutaneous, subcutaneous or intravenous form. In one exemplary form, some sensors 121 that are shown generally as being embedded in support tray 120 may otherwise be placed anywhere in or on the wearable electronic device [a user interface] 100 in such a manner as to facilitate acquiring data that in tum may be used by a behaviorist model (including machine learning and CDS variants) that can run as a set of instructions on the system 1 in order to correlate, manipulate and transform the data [algorithmic hyperparameters] into a form such that it can provide indicia of one or more LEAP traits associated the wearer of the wearable electronic device 100. In one form, the sensors 121 may act in conjunction with one another-as well as with instructions that are stored on a machine-readable medium such as memory 173B-to aggregate ( or fuse) the acquired data in order to infer certain activities, conditions or circumstances. Such sensor fusion can significantly improve the operability of the wearable electronic device [a user interface] 100 by leveraging the strengths of each of the sensors 121 to provide more accurate values of the acquired data rather than if only coming from one such sensor 121 in isolation [hyperparameter constraints]. For example, rotation-based gyroscopic measurement alone can lead to accumulating errors, while the absolute reference of orientation associated with accelerometers and magnetometers may be prone to high noise levels. By fusing the acquired raw data, the sensors 121 and accompanying data processing instructions can filter the information in order to compute a single estimate of (six degree-of-freedom, 6 DOF) movement, orientation or position, which in turn simplifies downstream computational requirements [computational load constraints]. For example, such fusing may occur through integrating the orthogonal angular-rate data from the gyroscopes in order to provide orientation information, and then measuring the linear acceleration vectors within a particular wearer frame [model hyperparameters] of reference and then rotating into wearer navigation coordinates using a rotation matrix as determined by the gyroscopes in order to remove gravitational effects.”)
the fused non-dominated solution set being generated based on the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations and comprising a plurality of non- dominated data points multiple model parameters for each hyperparameter configuration for multiple objectives of the one or more machine learning models; (Sobol, paragraph 0164, “As such, the sensors 121 are non-invasive in that they need not be ingested or in percutaneous, subcutaneous or intravenous form. In one exemplary form, some sensors 121 that are shown generally as being embedded in support tray 120 may otherwise be placed anywhere in or on the wearable electronic device 100 in such a manner as to facilitate acquiring data that in tum may be used by a behaviorist model (including machine learning and CDS variants) that can run as a set of instructions on the system 1 in order to correlate, manipulate and transform the data into a form such that it can provide indicia of one or more LEAP traits associated the wearer of the wearable electronic device 100. In one form, the sensors 121 may act in conjunction with one another-as well as with instructions that are stored on a machine-readable medium such as memory 173B-to aggregate ( or fuse) the acquired data in order to infer certain activities [the fused non-dominated solution set], conditions or circumstances. Such sensor fusion can significantly improve the operability of the wearable electronic device 100 by leveraging the strengths of each of the sensors 121 to provide more accurate values of the acquired data rather than if only coming from one such sensor 121 in isolation. For example, rotation-based gyroscopic measurement alone can lead to accumulating errors, while the absolute reference of orientation associated with accelerometers and magnetometers may be prone to high noise levels [the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations]. By fusing the acquired raw data, the sensors 121 and accompanying data processing instructions can filter the information in order to compute a single estimate of (six degree-of-freedom, 6 DOF) movement, orientation or position, which in turn simplifies downstream computational requirements. For example, such fusing may occur through integrating the orthogonal angular-rate data from the gyroscopes in order to provide orientation information [multiple model parameters for each hyperparameter configuration for multiple objectives of the one or more machine learning models;], and then measuring the linear acceleration vectors within a particular wearer frame of reference and then rotating into wearer navigation coordinates using a rotation matrix as determined by the gyroscopes in order to remove gravitational effects.”)
modify at least one of the one or more algorithmic hyperparameters and the one or more model hyperparameters of the one or more machine learning models based on the fused non- dominated solution set; and (Sobol, paragraph 0164, “As such, the sensors 121 are non-invasive in that they need not be ingested or in percutaneous, subcutaneous or intravenous form. In one exemplary form, some sensors 121 that are shown generally as being embedded in support tray 120 may otherwise be placed anywhere in or on the wearable electronic device 100 in such a manner as to facilitate acquiring data that in tum may be used by a behaviorist model (including machine learning and CDS variants) that can run as a set of instructions on the system 1 in order to correlate, manipulate and transform the data [modify at least one of the one or more algorithmic hyperparameters] into a form such that it can provide indicia of one or more LEAP traits associated the wearer of the wearable electronic device 100…For example, the individual may go through various sitting, standing, walking, running (if possible) and related movements that can be labeled for each activity where classification is desired. As will be discussed in more detail later, such labeling may be useful in performing supervised machine learning, particularly as it applies to training a machine learning model [the one or more machine learning models based on the fused non- dominated solution set;].”)
send the one or more modified hyperparameters of the one or more machine learning models to the user interface. (Sobol, paragraph 0143, “The application layer acts as the user interface responsible for displaying received information to the user by standardizing communication based on activities within the underlying transport layer protocols.” And paragraph 0223, “Thus, an HMM facilitates the modeling of a given event or process with a hidden state that is based on observable parameters, particularly in determining the likelihood of a given sequence. Such a framework is particularly useful for modeling events that have temporal-based data structures (such as that associated with movement or positional data that is acquired from accelerometers or gyroscopes, as well as speech recognition, speech generation and human gesture recognition) in that an HMM can be visualized as essentially a quantization of a system's spatial components into a small number of discrete states, together with probabilities for the time-based transitions between such states.”)
Loni, Choi, Gustavsson and Sobol are related to the same field of endeavor (i.e. optimization). In view of the teachings of Sobol it would have been obvious for a person with ordinary skill in the art to apply the teachings of Sobol to Loni, Choi and Gustavsson before the effective filing date of the claimed invention in order to promote greater accuracy of data. (Sobol, paragraph 0117, “In addition, the temporal nature of the collected data-coupled with using various data-gathering modalities for such data collection and other determination allows increased contextual insight into a wearer's moment-to- moment activities, which in turn promotes greater accuracy in the ability to analyze the health of the person wearing the device.”)

In regard to claim 15, the claim recites similar limitations as corresponding claim 1, and is rejected for similar reasons as claim 1 using similar teachings and rationale.

In regard to claim 18, the claim recites similar limitations as corresponding claim 1, and is rejected for similar reasons as claim 1 using similar teachings and rationale.
Loni further teaches A computer program product comprising a processor-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by the one or more processors implement steps of: (Loni, pg. 9, Col. 1, paragraph 2, “In general, Keras uses the TensorFlow backend for training models and mapping on the hardware platform. TensorFlow supports a wide range of hardware platforms from x86 CPU processors to ARM-based platforms.” And paragraph 3, “Unlike CPUs, we do need an initialization phase to copy data to GPU/FPGA’s internal memory, before lunching processing kernel. Usually, kernel time is used for reporting runtime results; however, considering the communication time is vital for embedded implementations, especially for mission-critical applications since these applications are mainly latency oriented.”)

In regard to claim 3, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1.
Loni further teaches wherein the other of the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations comprises the plurality of model parameter optimization operations and the subset of the first solution set comprises a plurality of different hyperparameter configurations. (Loni, pg. 5, Col. 1, paragraph 1, “DeepMaker is equipped with the fast and multi-objective GP, NSGA-II [the plurality of model parameter optimization operations], to discover a near-optimal set of hyperparameters considering both the accuracy and the network size as the objectives [a plurality of different hyperparameter configurations.]. Total trainable network weights are defined as the network size objective since the performance and energy efficiency of the backend accelerator highly rely on inner product operations, which are execution bottleneck of DNNs [9] .”)

In regard to claim 4, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1.
Loni further teaches wherein the at least one processing device, when executing program code, is further configured to select the subset of the first solution set based at least in part on one or more selection metrics. (Loni, pg. 4, Col. 1, paragraph 2, “NSGA-II works as follows: In the first step, an offspring population U t is formed from a parent population P t by using Genetic Programming, both with size N . Then we combine U t and P t to devise a third population R t of size 2 ∗N . Next, NSGA-II extracts a population (with size N ) from R t by employing multiple objectives [at least in part on one or more selection metrics.], nondominated sorting, and crowding distance comparison. The main aim of non-dominated sorting is to find a set of solution which cannot dominate each other. Moreover, by doing crowding distance sorting, we can orchestrate the density of solution for each Pareto front. NSGA-II selects the best N candidates for generating the next population called P t + 1. [select the subset of the first solution set]”)

In regard to claim 7, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18. 
Loni further teaches wherein the at least one processing device, when executing program code, is further configured to select one or more non-dominated data points for a Pareto frontier. (Loni, pg. 4, Col. 1, paragraph 1, “In this work, the Non-Dominated Sorting Genetic Algorithm (NSGA-II) [8] has been used to solve the exploration problems. NSGA-II is a robust meta-heuristic population-based evolutionary algorithm solving MOO problems that aim to adaptively fit a set of candidates to Pareto frontier [select one or more non-dominated data points for a Pareto frontier].”)

In regard to claim 8, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18.
Loni further teaches wherein the plurality of hyperparameter optimization operations are performed using a plurality of hyperparameters, the plurality of hyperparameters comprising one or more of a machine learning model size, a machine learning model learning rate and a machine learning model component size. (Loni, pg. 3, Col. 2, paragraph 8, “In this paper, MOO is used to solve the neural architectural search problem by finding a set of Pareto-optimal sets of network hyperparameters. The key design objectives which are considered in this paper for the network optimization are classification accuracy and network size [a machine learning model size].”)

In regard to claim 9, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18.
Loni further teaches wherein the plurality of model parameter optimization operations are performed using a plurality of model parameters, the plurality of model parameters comprising one or more of machine learning model nodal weights, machine learning model biases and machine learning model coefficients. (Loni, pg. 3, Col. 2, paragraph 2, “The Conv and FC layers are the most computation-intensive layers in CNNs. They have the same basic operations: b j = _i a i . w i, j , i.e., the weighted sum of the inputs. The weights (w i,j ) are learned from the training phase, and the inputs ( a i ) are from the previous layer [machine learning model nodal weights]. While the Conv layers use small groups of weights (called kernels) to slide over the inputs, the FC layers use a full connection between input and output neurons.”)

In regard to claim 10, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18.
Loni further teaches wherein the plurality of hyperparameter optimization operations are performed using a multi-objective hyperparameter optimization technique. (Loni, pg. 3, 3.2 Multi-Objective Optimization (MOO), “The problem of finding the best configuration (s) of a parameterized system S with n different parameters with respect to m different objectives is called a MOO Problem [a multi-objective hyperparameter optimization technique] [41]. The set of all possible configurations is called the Design Space, whereas each point C in this space (each configuration C) is called a solution to the MOO problem.”)

In regard to claim 12, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18.
Loni further teaches wherein the plurality of hyperparameter optimization operations are performed one of without constraints and with one or more user-defined constraints. (Loni, pg. 2, Col. 1, paragraph 3, “To approximate an application, developers first need to identify [user-defined constraints] the approximation region of the code, then provide a training dataset for the specified code block in order to be mimicked by a DNN generated by DeepMaker. The approximation region of the code should be both hotspot and less sensitive to a quality loss in both data and operations. We can define a hotspot as a code region that consumes considerable energy or occupies a significant part of execution time [7] .” and Bullet Point 4, “Adaptive finding the best architecture regarding resource budget and execution time constraints [one or more user-defined constraints]. Then, mapping the generated network on different platforms to evaluate the applicability of DeepMaker is our last contribution.”)

In regard to claim 13, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18.
Loni further teaches wherein the plurality of model parameter optimization operations are performed using at least one of one or more adaptive weights and one or more custom objectives. (Loni, pg. 3, Col. 2, paragraph 2, “The weights (w i,j ) are learned from the training phase [using at least one of one or more adaptive weights], and the inputs ( a i ) are from the previous layer. While the Conv layers use small groups of weights (called kernels) to slide over the inputs, the FC layers use a full connection between input and output neurons.”)

In regard to claim 14, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18.
Loni further teaches wherein the performing of the one of and the other of the plurality of hyperparameter optimization operations and the plurality of model parameter optimization operations, the generating, the selecting, and the performing fusion process are iteratively executed. (Loni, pg. 4, Col. 1, paragraph 2, “This procedure is repeated [operations are iteratively executed] for the next generations until it exceeds a predefined maximum number of generations or satisfies the developer’s criterion including the desired level of accuracy/network size.”)

In regard to claim 17 and analogous claim 20, Loni, Choi, Gustavsson and Sobol teaches the method of claim 15.
Loni further teaches the subset of the first solution set comprises a plurality of different hyperparameter configurations. (Loni, pg. 5, Col. 1, paragraph 1, “DeepMaker is equipped with the fast and multi-objective GP, NSGA-II, to discover a near-optimal set of hyperparameters considering both the accuracy and the network size as the objectives [a plurality of different hyperparameter configurations.]. Total trainable network weights are defined as the network size objective since the performance and energy efficiency of the backend accelerator highly rely on inner product operations, which are execution bottleneck of DNNs [9].”)

In regard to claim 21, Loni, Choi, Gustavsson and Sobol teaches the method of claim 15.
Loni further teaches wherein the at least one processing device, when executing program code, is further configured to select the subset of the first solution set based at least in part on one or more selection metrics. (Loni, pg. 4, Col. 1, paragraph 2, “Moreover, by doing crowding distance sorting, we can orchestrate the density of solution for each Pareto front. NSGA-II selects the best N candidates for generating the next population called P t + 1 . This procedure is repeated for the next generations until it exceeds a predefined maximum number of generations or satisfies the developer’s criterion including the desired level of accuracy/network size [select the subset of the first solution set based at least in part on one or more selection metrics.].”)

Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Loni in view of  Choi, Gustavsson and Sobol and in further view of Heinrich et al (US Published Patent Application No. 20200226461, "Heinrich").

In regard to claim 5, Loni, Choi, Gustavsson and Sobol teaches the method of claim 4. 
However, Loni, Choi, Gustavsson and Sobol do not explicitly teach wherein the one or more selection metrics comprise at least one of a training time and one or more objectives of interest.
Heinrich teaches wherein the one or more selection metrics comprise at least one of a training time and one or more objectives of interest. (Heinrich, paragraph 0038, “In these embodiments, selection engine 124 balances between breadth and depth in hyperparameter metaoptimization by leveraging unpredictability of scheduling, run time [one of a training time], and performance metrics 222 related to training machine learning models 210-212.”)
Loni, Choi, Gustavsson, Sobol and Heinrich are related to the same field of endeavor (i.e. parameter optimization). In view of the teachings of Heinrich, it would have been obvious for a person with ordinary skill in the art to apply the teachings of Heinrich to Loni, Choi, Gustavsson and Sobol before the effective filing date of the claimed invention in order to identify hyperparameters for a machine learning model with the best performance metrics. (Heinrich, paragraph 0018, “In these embodiments, hyperparameter metaoptimization includes identifying a set of hyperparameters that produces a machine learning model with a best or highest performance metric.”)

In regard to claim 6, Loni, Choi, Gustavsson, Sobol and Heinrich teach the method of claim 5.
Heinrich further teaches wherein the one or more objectives of interest comprise at least one of false positive rate, recall and accuracy. (Heinrich, paragraph 0052, “The performance metrics may include, but are not limited to, a precision, recall, accuracy [recall and accuracy], ROC AUC, E/O ratio, and/or another measure of machine learning performance.”)
Loni, Choi, Gustavsson, Sobol and Heinrich are combinable for the same rationale as set forth above with respect to claim 5.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Loni, in view of Choi, Gustavsson and Sobol as applied to claim 1 above, and in further view of Alibrahim et al (Hyperparameter Optimization: Comparing Genetic Algorithm against Grid Search and Bayesian Optimization, "Alibrahim").

In regard to claim 11, Loni, Choi, Gustavsson and Sobol teaches the method of claim 1 and analogous claims 15 and 18. 
However, Loni, Choi, Gustavsson and Sobol do not explicitly teach wherein the plurality of hyperparameter optimization operations are performed using one of a cross-entropy loss objective, a hinge loss objective and a softmax loss objective.  
Alibrahim teaches wherein the plurality of hyperparameter optimization operations are performed using one of a cross-entropy loss objective, a hinge loss objective and a softmax loss objective. (Alibrahim, Pg. 1555, Col. 1, Binary Cross entropy, “Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.”)
Loni, Choi, Gustavsson, Sobol, and Alibrahim are related to the same field of endeavor (i.e. parameter optimization). In view of the teachings of Alibrahim, it would have been obvious for a person with ordinary skill in the art to apply the teachings of Alibrahim to Loni, Choi, Gustavsson and Sobol before the effective filing date of the claimed invention in order to find the best hyperparameter values for a neural network. (Alibrahim, Abstract, “The main goal of this paper is to conduct a comparison study between different algorithms that are used in the optimization process in order to find the best hyperparameter values for the neural network.”)

Claim 22 and 23 is rejected under 35 U.S.C. 103 as being unpatentable over Loni, in view of Choi, Gustavsson and Sobol and in further view of Song et al (A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data, "Song") [Jan. 2013].

In regard to claim 22, Loni, Choi, Gustavsson and Sobol teach the method of claim 21.
However, Loni, Choi, Gustavsson and Sobol does not explicitly teach wherein the one or more selection metrics comprise at least one of a training time and one or more objectives of interest.
Song teaches wherein the one or more selection metrics comprise at least one of a training time and one or more objectives of interest. (Song, pg. 6, Col. 2, paragraph 5, “In the experiment, for each feature subset selection algorithm, we obtain M  N feature subsets Subset and the corresponding runtime Time with each data set. Average |Subset| and Time [one of a training time], we obtain the number of selected features further the proportion of selected features and the corresponding runtime for each feature selection algorithm on each data set. For each classification algorithm, we obtain MN classification Accuracy for each feature selection algorithm and each data set. Average these Accuracy [one or more objectives of interest], we obtain mean accuracy of each classification algorithm under each feature selection algorithm and each data set.”)
Loni, Choi, Gustavsson, Sobol and Song are related to the same field of endeavor (i.e. training). In view of the teachings of Song, it would have been obvious for a person with ordinary skill in the art to apply the teachings of Song to Loni, Choi, Gustavsson and Sobol before the effective filing date of the claimed invention in order to improve performance of classifiers. (Song, Abstract, “The results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers.”)

In regard to claim 23, Loni, Choi, Gustavsson, Sobol and Song teaches the method of claim 22.
Loni further teaches wherein the one or more objectives of interest comprise at least one of false positive rate, recall and accuracy. (Loni, pg. 4, Col. 1, paragraph 2, “Moreover, by doing crowding dis- tance sorting, we can orchestrate the density of solution for each Pareto front. NSGA-II selects the best N candidates for generating the next population called P t + 1 . This procedure is repeated for the next generations until it exceeds a predefined maximum number of generations or satisfies the developer’s criterion including the desired level of accuracy [accuracy]/network size.”


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SKYLAR K VANWORMER whose telephone number is (703)756-1571. The examiner can normally be reached M-F 6:00am to 3:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.K.V./Examiner, Art Unit 2146                                                                                                                                                                                                        
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Sep 09, 2021
Application Filed
Nov 19, 2024
Non-Final Rejection mailed — §103
Feb 21, 2025
Response Filed
Jul 16, 2025
Final Rejection mailed — §103
Sep 16, 2025
Response after Non-Final Action
Oct 01, 2025
Request for Continued Examination
Oct 09, 2025
Response after Non-Final Action
Feb 24, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/331,475
Patent 12591789
Knowledge distillation in multi-arm bandit, neural network models for real-time online optimization
4y 10m to grant Granted Mar 31, 2026
17/169,083
Patent 12541680
REDUCED COMPUTATION REAL TIME RECURRENT LEARNING
4y 12m to grant Granted Feb 03, 2026
17/383,132
Patent 12524655
ARTIFICIAL NEURAL NETWORK PROCESSING METHODS AND SYSTEM
4y 5m to grant Granted Jan 13, 2026
17/350,840
Patent 12511554
Complex System for End-to-End Causal Inference
4y 6m to grant Granted Dec 30, 2025
17/514,512
Patent 12505358
Methods and Systems for Approximating Embeddings of Out-Of-Knowledge-Graph Entities for Link Prediction in Knowledge Graph
4y 1m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
39%
Grant Probability
62%
With Interview (+22.5%)
4y 0m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 28 resolved cases by this examiner. Grant probability derived from career allowance rate.