Last updated: July 17, 2026
Application No. 18/034,425
METHOD AND APPARATUS FOR GENERATING TARGET DEEP LEARNING MODEL

Final Rejection §103
Filed
Apr 28, 2023
Priority
Jun 22, 2022 — nonprovisional of PCTCN2022100582
Examiner
SPRAUL III, VINCENT ANTON
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Jina AI (Beijing) Co. Ltd.
OA Round
2 (Final)
Interview Optional

— +26.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 58% grant rate with +26.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 43 resolved cases, 2023–2026
Examiner Intelligence

SPRAUL III, VINCENT ANTON View full profile →
Grants 58% of resolved cases
Career Allowance Rate
25 granted / 43 resolved
+3.1% vs TC avg
Strong +26% interview lift
Without
With
+26.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.4%
-37.6% vs TC avg
§103
94.2%
+54.2% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
1.5%
-38.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 43 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Regarding the rejection of claims 2–3 and 16–20 under 35 U.S.C. 112(b), amendments to the claims have overcome the rejections, which are withdrawn.

Regarding the rejection of claim 15 under 35 U.S.C. 101 for not falling under a statutory category of invention, amendments to the claim have overcome the rejection, which is withdrawn.

Regarding the rejection of claims over prior art, Applicant’s arguments are directed towards amended portions of the claims which have not been previously examined, and for which new grounds of rejection are given below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3, 10–11, 13–15, and 20 rejected under 35 U.S.C. 103 over Huang et al., US Pre-Grant Publication No. 2021/0271809 (hereafter Huang) in view of Mahakali et al., US Pre-Grant Publication No. 2023/0222264 (hereafter Mahakali).

Regarding claim 1 and analogous claims 13-15:
Huang teaches:
“A method for generating a target deep learning model, the method comprising”: Huang, paragraph 0004-0006, “According to embodiments of the disclosure, a method for performing machine learning processing is provided [A method for generating a target deep learning model, the method comprising]. The method includes obtaining data; obtaining a labelling result of the data; and selecting at least one of a model framework meeting a requirement of a user and a model meeting a predicted target of the user, and performing model training using the data and the labelling result of the data based on at least one of the model framework and the model, in which the model framework is a framework used for performing the model training based on a machine learning algorithm. According to embodiments of the disclosure, a computing device is provided. The computing device includes a processor and a memory. The memory has executable codes stored thereon. When the executable codes are executed by the processor, the processor is caused to perform the method according to the first aspect of the disclosure. According to embodiments of the disclosure, a non-transitory machine-readable storage medium is provided. The storage medium has executable codes stored thereon. When the executable codes are executed by a processor of an electronic device, the processor is caused to perform a method according to the first aspect of the disclosure.”; Huang, paragraph 0016, “FIG. 9 is a block diagram illustrating an apparatus for performing machine learning process according to example embodiments of the disclosure.”
“obtaining, from a user, an instruction and original data for generating the target deep learning model, wherein the instruction comprises a task expected to be performed by the target deep learning model”: Huang, paragraph 0150, “The data collection platform can provide a user with a data upload interface, and receive data uploaded by the user for the model training [obtaining, from a user … original data for generating the target deep learning model]. In addition, the data collection platform can also provide the user with the data collection service. In a case where user's data for the model training is insufficient, the user's data collection needs can be acquired and data collection operations can be performed. For example, the user can define tasks, such as ‘request to collect pictures containing various fruits’ [obtaining, from a user, an instruction … wherein the instruction comprises a task expected to be performed by the target deep learning model]. The data collection platform can collect raw data meeting the user's needs based on the tasks entered by the user. The collected raw data may be data without labelling results. The data collection process can be referred to descriptions of FIG. 2 above, which is not repeated here.”
“generating training data from the original data”: Huang, paragraph 0151-0152, “The data labelling platform can provide the user with data labelling services. A general workflow of the data labelling platform may include the following. The data labelling platform can receive data labelling requests from a user or the data collection platform, package the data to be labelled into labelling tasks, and send them to one or more labelers who can perform manual labelling. The labelers perform the manual labelling on the data to be labelled. The data labelling platform can organize the manual labelling results, and save or send the organized labelling results. The algorithm platform can receive the data and the labelling results sent by the data labelling platform, and use the data and the labelling results to automatically perform the model training [generating training data from the original data].”
“selecting, based on a performance criterion, one of the trained candidate deep learning models as the first deep learning model”: Huang, paragraph 0113 - 0114, “As another example of the disclosure, for each model framework corresponding to the task type that matches the user's requirements, optimal hyperparameter combination of each model framework may be obtained through a manner of hyperparameter optimization, and the model framework performing best and its optimal hyperparameter combination may be selected [selecting, based on a performance criterion, one of the trained candidate deep learning models as the first deep learning model]. For example, for each model framework corresponding to the task type that matches the user's needs, algorithms such as grid search, random search, and Bayesian optimization may be used to set different hyperparameter combinations, the model may be trained with the training samples, the model is tested. The set of hyperparameters of the model that performs best (for example, the model can be evaluated based on test indicators such as accuracy and loss) can be used as the optimal hyperparameter combination under the model framework. The optimal hyperparameter combinations under different model frameworks are compared with each other to select the model framework with the best performance (such as high accuracy and low loss) and its optimal hyperparameter combination. The model framework is a framework for training models based on machine learning algorithms. Based on the selected model framework, training samples can be used for the model training.”
Huang does not explicitly teach:
“identifying a first deep learning model corresponding to the task by training a plurality of candidate deep learning models corresponding to the task with a part of the training data”
“after selecting the one of the trained candidate deep learning models as the first deep learning model, further training the first deep learning model with the training data to obtain the target deep learning model”
	Mahakali teaches:
“identifying a first deep learning model corresponding to the task by training a plurality of candidate deep learning models corresponding to the task with a part of the training data”: Mahakali, paragraph 0082, “At block 312, the system 300 performs model training ( e.g., via training engine 182 of FIG. 1A) using the training set 302. The system 300 may train multiple models using multiple sets of features of the training set 302 [training a plurality of candidate deep learning models corresponding to the task with a part of the training data] ( e.g., a first set of features including a group of simulated sensors of the training set 302, a second set of features including a different group of simulated sensors of the training set 302, etc.)”; Mahakali, paragraph 0084, “At block 316, the system 300 performs model selection (e.g., via selection engine 185 of FIG. 1A) to determine which of the one or more trained models that meet the threshold accuracy has the highest accuracy (e.g., the selected model 308, based on the validating of block 314) [identifying a first deep learning model corresponding to the task ].”
“after selecting the one of the trained candidate deep learning models as the first deep learning model, further training the first deep learning model with the training data to obtain the target deep learning model”: Mahakali, paragraph 0088, “In some embodiments, retraining of the machine learning model occurs by supplying additional data (e.g., current data 346) to further train the model. Current data 346 may be provided at block 312. Current data 346 may be different from the historical data 360 originally used to train the model by incorporating combinations of input parameters not part of the original training, input parameters outside the parameter space spanned by the original training, or may be updated to reflect chamber specific knowledge ( e.g., differences from an ideal chamber due to manufacturing tolerance ranges, aging components, etc.). Selected model 308 may be retrained based on this data [after selecting the one of the trained candidate deep learning models as the first deep learning model, further training the first deep learning model with the training data to obtain the target deep learning model]. In some embodiments, the selected model is retrained or further trained until change of calibration parameter data 362 responsive to training of the selected model 308 meets a threshold change ( e.g., is less than a threshold amount of change).”
Mahakali and Huang are analogous arts as they are both related to model selection and training methods. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the further training of the selected model from Mahakali with the teachings of Huang to arrive at the present invention, in order to ensure the selected model meets expectations in use, as stated in Mahakali, paragraph 0088, “In some embodiments, the selected model is retrained or further trained until change of calibration parameter data 362 responsive to training of the selected model 308 meets a threshold change ( e.g., is less than a threshold amount of change).”

Regarding claim 3:
	Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang further teaches “wherein the plurality of candidate deep learning models differ from one another in at least one of: total numbers of layers of the plurality of candidate deep learning models are different, layer numbers of output layers of the plurality of candidate deep learning models are different or training parameters for training the plurality of candidate deep learning models are at least partially different”: Huang, paragraph 0113 - 0114, “As another example of the disclosure, for each model framework corresponding to the task type that matches the user's requirements, optimal hyperparameter combination of each model framework may be obtained through a manner of hyperparameter optimization, and the model framework performing best and its optimal hyperparameter combination may be selected. For example, for each model framework corresponding to the task type that matches the user's needs, algorithms such as grid search, random search, and Bayesian optimization may be used to set different hyperparameter combinations, the model may be trained with the training samples, the model is tested. The set of hyperparameters of the model that performs best (for example, the model can be evaluated based on test indicators such as accuracy and loss) can be used as the optimal hyperparameter combination under the model framework [training parameters for training the plurality of candidate deep learning models are at least partially different]. The optimal hyperparameter combinations under different model frameworks are compared with each other to select the model framework with the best performance (such as high accuracy and low loss) and its optimal hyperparameter combination. The model framework is a framework for training models based on machine learning algorithms. Based on the selected model framework, training samples can be used for the model training.”

Regarding claim 10:
	Huang as modified by Mahakali teaches “the method according to claim 1.”
	Huang further teaches “wherein the task comprises a search task, and the target deep learning model comprises a deep learning model for a neural search”: Huang, paragraph 0032, “Therefore, selecting a model matching the user's predicted target may refer to selecting the model matching the user's predicted target from previously trained models. The predicted target refers to predicted functions achieved by the model trained based on the user's desires. For example, in a case a function achieved by the model trained based on the user's desires is identifying cats in an image, the predicted target is ‘identifying cats in an image’ [wherein the task comprises a search task]. The model matching the user's predicted target refers to a model that can achieve the same or similar functions as the predicted target [the target deep learning model comprises a deep learning model for a neural search]. For example, in a case that the user's predicted target is ‘identifying cats in an image’, a previously trained model that is used for identifying cats in an image may be used as the model matching the user's predicted target, or a previously trained model used for identifying other types of animals (such as dogs, pigs or the like) can be used as the model matching the user's predicted target.”

Regarding claim 11:
	Huang as modified by Mahakali teaches “the method according to claim 10.”
	Huang as modified by Mahakali further teaches “wherein the search task comprises one of: searching for pictures with texts; searching for texts with texts; searching for pictures with pictures; searching for texts with pictures; or searching for sounds with sounds”: Huang, paragraph 0032, “Therefore, selecting a model matching the user's predicted target may refer to selecting the model matching the user's predicted target from previously trained models. The predicted target refers to predicted functions achieved by the model trained based on the user's desires. For example, in a case a function achieved by the model trained based on the user's desires is identifying cats in an image, the predicted target is ‘identifying cats in an image’ [wherein the search task comprises one of: searching for pictures with texts]. The model matching the user's predicted target refers to a model that can achieve the same or similar functions as the predicted target. For example, in a case that the user's predicted target is ‘identifying cats in an image’, a previously trained model that is used for identifying cats in an image may be used as the model matching the user's predicted target, or a previously trained model used for identifying other types of animals (such as dogs, pigs or the like) can be used as the model matching the user's predicted target.”

Regarding claim 20:
	Huang as modified by Mahakali teaches “the method according to claim 1.”
	Huang further teaches “wherein the task comprises a search task, and the target deep learning model comprises a deep learning model for a neural search”: Huang, paragraph 0032, “Therefore, selecting a model matching the user's predicted target may refer to selecting the model matching the user's predicted target from previously trained models. The predicted target refers to predicted functions achieved by the model trained based on the user's desires. For example, in a case a function achieved by the model trained based on the user's desires is identifying cats in an image, the predicted target is ‘identifying cats in an image’ [wherein the task comprises a search task]. The model matching the user's predicted target refers to a model that can achieve the same or similar functions as the predicted target [the target deep learning model comprises a deep learning model for a neural search]. For example, in a case that the user's predicted target is ‘identifying cats in an image’, a previously trained model that is used for identifying cats in an image may be used as the model matching the user's predicted target, or a previously trained model used for identifying other types of animals (such as dogs, pigs or the like) can be used as the model matching the user's predicted target.”

Claims 4-5 rejected under 35 U.S.C. 103 over Huang as modified by Mahakali in view of Sankaran et al., US Pre-Grant Publication No. 2020/0074347 (hereafter Sankaran). 

Regarding claim 4:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach “wherein the instruction further comprises at least one of: a model type of the first deep learning model; a total number of layers of the first deep learning model; a layer number of an output layer of the first deep learning model; or training parameters for training the first deep learning model.”
Sankaran teaches “wherein the instruction further comprises at least one of: a model type of the first deep learning model; a total number of layers of the first deep learning model; a layer number of an output layer of the first deep learning model; or training parameters for training the first deep learning model”: Sankaran, paragraph 0031, “In some embodiments, the ingested data may include user constraints 208”; Sankaran, paragraph 0034, “In some embodiments, user constraints 208 may specify a maximum training time for a deep learning model. For example, user constraints 208 may specify the maximum training time that a model can take for training [wherein the instruction further comprises at least one of : … training parameters for training the first deep learning model].”
Sankaran and Huang are analogous arts as they are both related to user-suggested data modelling. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user model constraints of Sankaran with the teachings of Huang to arrive at the present invention, in order to allow the user to get a model trained in a maximum time, as stated in Sankaran, paragraph 0034, “In some embodiments, user constraints 208 may specify a maximum training time for a deep learning model. For example, user constraints 208 may specify the maximum training time that a model can take for training.” 

Regarding claim 5:
Huang as modified by Mahakali and Sankaran teaches “the method according to claim 4.”
Sankaran further teaches “wherein the training parameters comprise at least one of: a learning rate; or a training stop condition”: Sankaran, paragraph 0034, “In some embodiments, user constraints 208 may specify a maximum training time for a deep learning model. For example, user constraints 208 may specify the maximum training time that a model can take for training [wherein the training parameters comprise at least one of : a learning rate].”
Sankaran and Huang are combinable for the rationale given under claim 4.

Claims 6 and 16 rejected under 35 U.S.C. 103 over Huang as modified by Mahakali in view of Ormerod, US Pre-Grant Publication No. 2019/0266234 (hereafter Ormerod). 

Regarding claim 6:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model.”
Ormerod teaches “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model”: Ormerod, paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”
Ormerod and Huang  are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the explicit determination of a loss function optimizer of Ormerod with the teachings of Huang to arrive at the present invention, in order to improve training, as stated in Ormerod , paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”

Regarding claim 16:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and the optimizer are configured to train the first deep learning model.”
Ormerod teaches “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and the optimizer are configured to train the first deep learning model”: Ormerod, paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and the optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”
Ormerod and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the explicit determination of a loss function optimizer of Ormerod with the teachings of Huang to arrive at the present invention, in order to improve training, as stated in Ormerod , paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”

Claims 7 and 17 rejected under 35 U.S.C. 103 over Huang as modified by Mahakali in view of Xia et al., US Patent No. 10,810,491 (hereafter Xia). 

Regarding claim 7:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model.”
Xia teaches “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model”: Xia, col. 7, line 64 through col. 8, line 6, “In some implementations clients 174 may use interactive control elements of the interface (e.g., by clicking on a portion of a model layout) to indicate the particular layer or feature they wish to inspect visually, to zoom in on a particular iteration's details, and so on. In at least some embodiments, the visualizations 185 may be provided in real time or near real time-for example, within a few seconds of the completion of a particular training iteration, the value of the loss function [displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model].”
Xia and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training data display of Xia with the teachings of Huang to arrive at the present invention, in order to provide the user with timely insight into the training process for better training results, as stated in Xia, col. 3, lines 58-63, “In some embodiments, to help provide more timely insights into the training and/or testing of a model, a machine learning service (or more generally, a machine 60 learning training/testing environment which may not necessarily be implemented as part of a provider network service) may comprise a visualization manager.”

Regarding claim 17:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model.”
Xia teaches “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model”: Xia, col. 7, line 64 through col. 8, line 6, “In some implementations clients 174 may use interactive control elements of the interface (e.g., by clicking on a portion of a model layout) to indicate the particular layer or feature they wish to inspect visually, to zoom in on a particular iteration's details, and so on. In at least some embodiments, the visualizations 185 may be provided in real time or near real time-for example, within a few seconds of the completion of a particular training iteration, the value of the loss function [displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model].”
Xia and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training data display of Xia with the teachings of Huang to arrive at the present invention, in order to provide the user with timely insight into the training process for better training results, as stated in Xia, col. 3, lines 58-63, “In some embodiments, to help provide more timely insights into the training and/or testing of a model, a machine learning service (or more generally, a machine 60 learning training/testing environment which may not necessarily be implemented as part of a provider network service) may comprise a visualization manager.”

Claims 8 and 18 rejected under 35 U.S.C. 103 over Huang as modified by Mahakali in view of Creedon et al., US Pre-Grant Publication No. 2021/0256418 (hereafter Creedon) and Woo et al., US Patent No. 11,544,594 (hereafter Woo). 

Regarding claim 8:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach:
“wherein training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”
“generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”
	Creedon teaches “wherein training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”: Creedon, paragraph 0047-0048, “FIG. 5 is a flow chart illustrating an exemplary implementation of a machine learning model training parameter caching process 500, according to an embodiment of the disclosure. As shown in FIG. 5 the exemplary implementation of a machine learning model training parameter caching process 500 initially trains a machine learning model during step 510, using a given training dataset. During step 520, at least one parameter of the machine learning model from the training with the given training dataset is cached, and then the cached at least one parameter of the machine learning model is used during step 530 for a subsequent training of the machine learning model
(for example, with the given training dataset). The caching of step 520 is performed, for example, after each of a plurality of iterations of the training of the machine learning model [recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training].”
Creedon and Huang are analogous arts as they are both related to neural network model training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training parameter cache of Creedon with the teachings of Huang to arrive at the present invention, in order to reduce redundant training, as stated in Creedon, paragraph 0035, “In one or more embodiments, the disclosed techniques for caching the machine learning model training parameters reduces redundant retaining and GPU usage, by introducing a cache optimization at the GPU driver 420 and vGPU Manager 460 layer during the training of deep neural networks and other machine learning models.”
	Woo teaches “generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”: Woo, col. 3, lines 4-9, “According to various embodiments of the present disclosure, the initial setting user inputs may further include determining parameter and structure provided based on the selected AI algorithm, and the parameter may include the number of layers for the AI algorithm, node for each of layers, function, and the number of iterations [in response to receiving a user's selection of a number of training rounds of the first deep learning model]”; Woo, col. 21, lines 56-62, “The AI platform 840 may receive necessary information for initializing the AI algorithm (S823). The AI platform 840 may receive an input dataset from a user device 910. AI training is performed by iterating through the input dataset and updating weights of AI algorithm connections [generating, … the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model]. Each input data instance may reinforce the behavior of the AI algorithm response (S824).”
Woo and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-specified iteration of Woo with the teachings of Huang  to arrive at the present invention, in order to provide non-expert users control over the training process, as stated in Woo, col. 2, lines 1-9, “The various embodiments of the present disclosure are devised to providing an electronic device, a service providing server, and a system providing a service of user-participating-type AI (Artificial Intelligence) training. According to various embodiments, without complex coding for AI training by a user who is even not an expert, the AI training is simply performed by receiving user inputs through a predetermined platform resulting in improving user-convenience.”

Regarding claim 18:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach:
“wherein training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”
“generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”
	Creedon teaches “wherein training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”: Creedon, paragraph 0047-0048, “FIG. 5 is a flow chart illustrating an exemplary implementation of a machine learning model training parameter caching process 500, according to an embodiment of the disclosure. As shown in FIG. 5 the exemplary implementation of a machine learning model training parameter caching process 500 initially trains a machine learning model during step 510, using a given training dataset. During step 520, at least one parameter of the machine learning model from the training with the given training dataset is cached, and then the cached at least one parameter of the machine learning model is used during step 530 for a subsequent training of the machine learning model
(for example, with the given training dataset). The caching of step 520 is performed, for example, after each of a plurality of iterations of the training of the machine learning model [recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training].”
Creedon and Huang are analogous arts as they are both related to neural network model training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training parameter cache of Creedon with the teachings of Huang to arrive at the present invention, in order to reduce redundant training, as stated in Creedon, paragraph 0035, “In one or more embodiments, the disclosed techniques for caching the machine learning model training parameters reduces redundant retaining and GPU usage, by introducing a cache optimization at the GPU driver 420 and vGPU Manager 460 layer during the training of deep neural networks and other machine learning models.”
	Woo teaches “generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”: Woo, col. 3, lines 4-9, “According to various embodiments of the present disclosure, the initial setting user inputs may further include determining parameter and structure provided based on the selected AI algorithm, and the parameter may include the number of layers for the AI algorithm, node for each of layers, function, and the number of iterations [in response to receiving a user's selection of a number of training rounds of the first deep learning model]”; Woo, col. 21, lines 56-62, “The AI platform 840 may receive necessary information for initializing the AI algorithm (S823). The AI platform 840 may receive an input dataset from a user device 910. AI training is performed by iterating through the input dataset and updating weights of AI algorithm connections [generating, … the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model]. Each input data instance may reinforce the behavior of the AI algorithm response (S824).”
Woo and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-specified iteration of Woo with the teachings of Huang  to arrive at the present invention, in order to provide non-expert users control over the training process, as stated in Woo, col. 2, lines 1-9, “The various embodiments of the present disclosure are devised to providing an electronic device, a service providing server, and a system providing a service of user-participating-type AI (Artificial Intelligence) training. According to various embodiments, without complex coding for AI training by a user who is even not an expert, the AI training is simply performed by receiving user inputs through a predetermined platform resulting in improving user-convenience.”

Claims 9 and 19 rejected under 35 U.S.C. 103 over Huang as modified by Mahakali in view of Lee, US Pre-Grant Publication No. 2024/0143611 (hereafter Lee).

Regarding claim 9:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format.”
Lee teaches “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format”: Lee, paragraphs 0163-0165, “Referring to FIG. 7, the converter unit 360 may import the network structure and model data defined in the ONNX model format into the network model format of the database. Conversely, the converter unit 360 may export the network model from the database into a structured format including an ONNX model, or a CSV file. The converter unit 360 may convert ONNX, NNEF, and hyperparameter and learning parameter files into structured formats other than the ONNX model format. The user may convert the converted ONNX model and structured format into a desired target framework for use [wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format].”
Lee and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-selected format of Lee with the teachings of Huang to arrive at the present invention, to allow the user to uses the model in various selected formats, as stated in Lee, paragraph 0166, “Through conversion operations using the converter unit 360, the user may apply the network model to another type of deep learning framework. This may allow the DB server 10 to invoke the model stored in the database in the relational data format and apply the model to a dataset of a similar form.”

Regarding claim 19:
Huang as modified by Mahakali teaches “the method according to claim 1.”
Huang as modified by Mahakali does not explicitly teach “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format.”
Lee teaches “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format”: Lee, paragraphs 0163-0165, “Referring to FIG. 7, the converter unit 360 may import the network structure and model data defined in the ONNX model format into the network model format of the database. Conversely, the converter unit 360 may export the network model from the database into a structured format including an ONNX model, or a CSV file. The converter unit 360 may convert ONNX, NNEF, and hyperparameter and learning parameter files into structured formats other than the ONNX model format. The user may convert the converted ONNX model and structured format into a desired target framework for use [wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format].”
Lee and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-selected format of Lee with the teachings of Huang to arrive at the present invention, to allow the user to uses the model in various selected formats, as stated in Lee, paragraph 0166, “Through conversion operations using the converter unit 360, the user may apply the network model to another type of deep learning framework. This may allow the DB server 10 to invoke the model stored in the database in the relational data format and apply the model to a dataset of a similar form.”

Claim 12 rejected under 35 U.S.C. 103 over Huang as modified by Mahakali in view of Tang et al., US Pre-Grant Publication No. 2017/0004454 (hereafter Tang).
Huang as modified by Mahakali teaches “the method according to claim 10.”
Huang further teaches (bold only) “wherein generating the training data from the original data comprises:  determining a type of the original data, wherein the type of the original data comprises categorical data with label, session data with label, and data without label, a label of the categorical data indicates a category of the categorical data, and a label of the session data indicates a question-answer relevance of the session data; and generating the training data according to the type of the original data”: Huang, paragraph 0104-0105, “As an example, task types may be set based on problem type that the user desires to solve, and different task types correspond to different problem classifications. For a problem related to image data, the tasks can include image classification tasks, object recognition tasks, text recognition tasks, image segmentation tasks, and feature point detection tasks. The image classification refers to distinguishing different image categories based on the semantic information of the image. Image classification is an important basic problem in computer vision. Different image categories are distinguished based on the semantic information of the image and labeled with different categories [categorical data with label][a label of the categorical data indicates a category of the categorical data]”; Huang, paragraph 0028, “The data obtained at block S110 may have or not have a labelling result. A method for acquiring the labelling result is not limited in embodiments of the disclosure. That is, the data can be labelled with the labelling result in any way. The labelling result can be an objective and real labelling conclusion or a subjective result of manually labelling. In a case that the data acquired at block S110 has a labelling result, the labelling result of the data can be directly obtained. In a case that the data obtained at block S110 has no labelling result [data without label] or a part of the data obtained at block has no labeling result, the data may be labelled to obtain the labelling result of the data [determining a type of the original data … and generating the training data according to the type of the original data].”
Huang as modified by Mahakali does not explicitly teach (bold only) “wherein generating the training data from the original data comprises:  determining a type of the original data, wherein the type of the original data comprises categorical data with label, session data with label, and data without label, a label of the categorical data indicates a category of the categorical data, and a label of the session data indicates a question-answer relevance of the session data; and generating the training data according to the type of the original data.”
Tang teaches (bold only) “wherein generating the training data from the original data comprises:  determining a type of the original data, wherein the type of the original data comprises categorical data with label, session data with label, and data without label, a label of the categorical data indicates a category of the categorical data, and a label of the session data indicates a question-answer relevance of the session data; and generating the training data according to the type of the original data”: Tang, paragraph 0014, “A recommendation system is configured to match member profiles with job postings [that is, a kind of a question-answer session], so that those job postings that have been identified as potentially being of interest to a member represented by a particular member profile are presented to the member on a display device for viewing”; Tang, paragraph 0033, “Returning to FIG.2, the rank scores calculated by the learning to rank model 222 are assigned to the items in the list of recommended jobs 240. A list of recommended jobs with respective assigned rank scores 260 is provided to the training data collector 230. The training data collector 230 monitors events with respect to how the member, for whom the list of recommended jobs was generated, interacts with the associated job postings and, based on the monitors interactions, assigns relevance labels to the items in the list [session data with label … a label of the session data indicates a question-answer relevance of the session data]. As explained above, a job posting that is impressed and clicked by the associated member receives a different relevance score from a relevance label assigned to a job posting that was impressed but not clicked by the associated member. A list of recommended jobs with respective assigned relevance labels 270 is provided to a repository of training data 280. The training data stored in the database 280 is used to train the learning to rank model 222.”
Tang and Huang are analogous arts as they are both related to model training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the relevancy training data of Tang with the teachings of Huang to arrive at the present invention, in order to provide the most relevant model results, as stated in Tang, paragraph 0015, “Those job postings, for which their respective relevance values for a particular member profile are equal to or greater than a predetermined threshold value, are presented to that particular member, e.g., on the news feed page of the member or on some other page provided by the on-line social networking system.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Jan et al., US Pre-Grant Publication No. 2024/0232335, discloses a method of model selection in which candidate models validated using data from adversarial attacks to generate accuracy scores used in the selection.
Tsuyuki, US Pre-Grant Publication No. 2021/0357808, discloses an iterative process in which candidate models are trained and selected until only one model meets the selection criteria.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINCENT SPRAUL whose telephone number is (703) 756-1511. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL HUNTLEY can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VAS/
Examiner, Art Unit 2129                                                                                                                                                                                         
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Apr 28, 2023
Application Filed
Feb 02, 2026
Non-Final Rejection mailed — §103
Apr 27, 2026
Interview Requested
May 04, 2026
Response Filed
May 05, 2026
Applicant Interview (Telephonic)
May 05, 2026
Examiner Interview Summary
Jun 01, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/221,305
Patent 12675684
TRAINING NEURAL NETWORKS REPRESENTED AS COMPUTATIONAL GRAPHS
5y 3m to grant Granted Jul 07, 2026
17/477,493
Patent 12675685
Distributed Fault Detection
4y 9m to grant Granted Jul 07, 2026
17/590,930
Patent 12670384
METHOD FOR DETERMINING CLASS OF DATA TO BE DETERMINED USING MACHINE LEARNING MODEL, INFORMATION PROCESSING DEVICE, AND COMPUTER PROGRAM
4y 4m to grant Granted Jun 30, 2026
19/204,644
Patent 12657462
METHOD AND DEVICE FOR TRAINING NEURAL NETWORK MODEL
1y 1m to grant Granted Jun 16, 2026
17/795,597
Patent 12651160
METHOD FOR TRAINING NEURAL NETWORK BY USING DE-IDENTIFIED IMAGE AND SERVER PROVIDING SAME
3y 10m to grant Granted Jun 09, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
58%
Grant Probability
84%
With Interview (+26.4%)
4y 4m (~1y 2m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 43 resolved cases by this examiner. Grant probability derived from career allowance rate.