Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2-3 and 16-20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
These claims, either directly or through dependency, recite in relevant part, “wherein the candidate deep learning model performs the task best; and determining the candidate deep learning model as the first deep learning model, wherein the candidate deep learning model performs the task best.” The term “best” is a subjective term which renders the claim indefinite. Neither the claim nor the specification provide an objective standard for measuring the scope of the term. Therefore a person having ordinary skill in the art would be unable to determine the metes and bounds of the claim.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 15 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim does not fall within at least one of the four categories of patent eligible subject matter because neither the claim nor the specification limit “a computer-readable storage medium” to tangible forms, and therefore the claim includes transitory media which do not fall into one of the four statutory categories.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-3, 10-11, and 20 rejected under 35 U.S.C. 102(a) (1) as being anticipated by Huang et al., US Pre-Grant Publication No. 2021/0271809 (hereafter Huang).
Regarding claim 1 and analogous claims 13-15:
Huang teaches:
“A method for generating a target deep learning model, comprising”: Huang, paragraph 0004-0006, “According to embodiments of the disclosure, a method for performing machine learning processing is provided [A method for generating a target deep learning model]. The method includes obtaining data; obtaining a labelling result of the data; and selecting at least one of a model framework meeting a requirement of a user and a model meeting a predicted target of the user, and performing model training using the data and the labelling result of the data based on at least one of the model framework and the model, in which the model framework is a framework used for performing the model training based on a machine learning algorithm. According to embodiments of the disclosure, a computing device is provided. The computing device includes a processor and a memory. The memory has executable codes stored thereon. When the executable codes are executed by the processor, the processor is caused to perform the method according to the first aspect of the disclosure. According to embodiments of the disclosure, a non-transitory machine-readable storage medium is provided. The storage medium has executable codes stored thereon. When the executable codes are executed by a processor of an electronic device, the processor is caused to perform a method according to the first aspect of the disclosure.”; Huang, paragraph 0016, “FIG. 9 is a block diagram illustrating an apparatus for performing machine learning process according to example embodiments of the disclosure.”
“obtaining, from a user, an instruction and original data for generating the target deep learning model, wherein the instruction comprises a task expected to be performed by the target deep learning model”: Huang, paragraph 0150, “The data collection platform can provide a user with a data upload interface, and receive data uploaded by the user for the model training [obtaining, from a user … original data for generating the target deep learning model]. In addition, the data collection platform can also provide the user with the data collection service. In a case where user's data for the model training is insufficient, the user's data collection needs can be acquired and data collection operations can be performed. For example, the user can define tasks, such as ‘request to collect pictures containing various fruits’ [obtaining, from a user, an instruction … wherein the instruction comprises a task expected to be performed by the target deep learning model]. The data collection platform can collect raw data meeting the user's needs based on the tasks entered by the user. The collected raw data may be data without labelling results. The data collection process can be referred to descriptions of FIG. 2 above, which is not repeated here.”
“generating training data from the original data”: Huang, paragraph 0151-0152, “The data labelling platform can provide the user with data labelling services. A general workflow of the data labelling platform may include the following. The data labelling platform can receive data labelling requests from a user or the data collection platform, package the data to be labelled into labelling tasks, and send them to one or more labelers who can perform manual labelling. The labelers perform the manual labelling on the data to be labelled. The data labelling platform can organize the manual labelling results, and save or send the organized labelling results. The algorithm platform can receive the data and the labelling results sent by the data labelling platform, and use the data and the labelling results to automatically perform the model training [generating training data from the original data].”
“determining a first deep learning model corresponding to the task; and training the first deep learning model, with the training data to obtain the target deep learning model”: Huang, paragraph 0113 - 0114, “As another example of the disclosure, for each model framework corresponding to the task type that matches the user's requirements, optimal hyperparameter combination of each model framework may be obtained through a manner of hyperparameter optimization, and the model framework performing best and its optimal hyperparameter combination may be selected [determining a first deep learning model corresponding to the task]. For example, for each model framework corresponding to the task type that matches the user's needs, algorithms such as grid search, random search, and Bayesian optimization may be used to set different hyperparameter combinations, the model may be trained with the training samples, the model is tested. The set of hyperparameters of the model that performs best (for example, the model can be evaluated based on test indicators such as accuracy and loss) can be used as the optimal hyperparameter combination under the model framework. The optimal hyperparameter combinations under different model frameworks are compared with each other to select the model framework with the best performance (such as high accuracy and low loss) and its optimal hyperparameter combination. The model framework is a framework for training models based on machine learning algorithms. Based on the selected model framework, training samples can be used for the model training [training the first deep learning model, with the training data to obtain the target deep learning model].”
Regarding claim 2:
Huang teaches “the method according to claim 1.”
“wherein the step of determining the first deep learning model corresponding to the task comprises determining a plurality of candidate deep learning models corresponding to the task; training the plurality of candidate deep learning models with a part of the training data”: Huang, paragraph 0113 - 0114, “As another example of the disclosure, for each model framework corresponding to the task type that matches the user's requirements [determining a plurality of candidate deep learning models corresponding to the task], optimal hyperparameter combination of each model framework may be obtained through a manner of hyperparameter optimization, and the model framework performing best and its optimal hyperparameter combination may be selected. For example, for each model framework corresponding to the task type that matches the user's needs, algorithms such as grid search, random search, and Bayesian optimization may be used to set different hyperparameter combinations, the model may be trained with the training samples, the model is tested. The set of hyperparameters of the model that performs best (for example, the model can be evaluated based on test indicators such as accuracy and loss) can be used as the optimal hyperparameter combination under the model framework. The optimal hyperparameter combinations under different model frameworks are compared with each other to select the model framework with the best performance (such as high accuracy and low loss) and its optimal hyperparameter combination. The model framework is a framework for training models based on machine learning algorithms. Based on the selected model framework, training samples can be used for the model training [training the plurality of candidate deep learning models with a part of the training data].”
“determining a candidate deep learning model among the plurality of trained candidate deep learning models, wherein the candidate deep learning model performs the task best; and”: Huang, paragraph 0113 - 0114, “As another example of the disclosure, for each model framework corresponding to the task type that matches the user's requirements, optimal hyperparameter combination of each model framework may be obtained through a manner of hyperparameter optimization, and the model framework performing best and its optimal hyperparameter combination may be selected. For example, for each model framework corresponding to the task type that matches the user's needs, algorithms such as grid search, random search, and Bayesian optimization may be used to set different hyperparameter combinations, the model may be trained with the training samples, the model is tested. The set of hyperparameters of the model that performs best (for example, the model can be evaluated based on test indicators such as accuracy and loss) can be used as the optimal hyperparameter combination under the model framework [determining a candidate deep learning model among the plurality of trained candidate deep learning models, wherein the candidate deep learning model performs the task best]. The optimal hyperparameter combinations under different model frameworks are compared with each other to select the model framework with the best performance (such as high accuracy and low loss) and its optimal hyperparameter combination. The model framework is a framework for training models based on machine learning algorithms. Based on the selected model framework, training samples can be used for the model training.”
“determining the candidate deep learning model as the first deep learning model, wherein the candidate deep learning model performs the task best”: Huang, paragraph 0113 - 0114, “As another example of the disclosure, for each model framework corresponding to the task type that matches the user's requirements, optimal hyperparameter combination of each model framework may be obtained through a manner of hyperparameter optimization, and the model framework performing best and its optimal hyperparameter combination may be selected. For example, for each model framework corresponding to the task type that matches the user's needs, algorithms such as grid search, random search, and Bayesian optimization may be used to set different hyperparameter combinations, the model may be trained with the training samples, the model is tested. The set of hyperparameters of the model that performs best (for example, the model can be evaluated based on test indicators such as accuracy and loss) can be used as the optimal hyperparameter combination under the model framework [determining the candidate deep learning model as the first deep learning model, wherein the candidate deep learning model performs the task best]. The optimal hyperparameter combinations under different model frameworks are compared with each other to select the model framework with the best performance (such as high accuracy and low loss) and its optimal hyperparameter combination. The model framework is a framework for training models based on machine learning algorithms. Based on the selected model framework, training samples can be used for the model training.”
Regarding claim 3:
Huang teaches “the method according to claim 2.”
Huang further teaches “wherein total numbers of layers of the plurality of candidate deep learning models are different; and/or layer numbers of output layers of the plurality of candidate deep learning models are different; and/or training parameters for training the plurality of candidate deep learning models are at least partially different”: Huang, paragraph 0113 - 0114, “As another example of the disclosure, for each model framework corresponding to the task type that matches the user's requirements, optimal hyperparameter combination of each model framework may be obtained through a manner of hyperparameter optimization, and the model framework performing best and its optimal hyperparameter combination may be selected. For example, for each model framework corresponding to the task type that matches the user's needs, algorithms such as grid search, random search, and Bayesian optimization may be used to set different hyperparameter combinations, the model may be trained with the training samples, the model is tested. The set of hyperparameters of the model that performs best (for example, the model can be evaluated based on test indicators such as accuracy and loss) can be used as the optimal hyperparameter combination under the model framework [training parameters for training the plurality of candidate deep learning models are at least partially different]. The optimal hyperparameter combinations under different model frameworks are compared with each other to select the model framework with the best performance (such as high accuracy and low loss) and its optimal hyperparameter combination. The model framework is a framework for training models based on machine learning algorithms. Based on the selected model framework, training samples can be used for the model training.”
Regarding claim 10:
Huang teaches “the method according to claim 1.”
Huang further teaches “wherein the task comprises a search task, and the target deep learning model comprises a deep learning model for a neural search”: Huang, paragraph 0032, “Therefore, selecting a model matching the user's predicted target may refer to selecting the model matching the user's predicted target from previously trained models. The predicted target refers to predicted functions achieved by the model trained based on the user's desires. For example, in a case a function achieved by the model trained based on the user's desires is identifying cats in an image, the predicted target is ‘identifying cats in an image’ [wherein the task comprises a search task]. The model matching the user's predicted target refers to a model that can achieve the same or similar functions as the predicted target [the target deep learning model comprises a deep learning model for a neural search]. For example, in a case that the user's predicted target is ‘identifying cats in an image’, a previously trained model that is used for identifying cats in an image may be used as the model matching the user's predicted target, or a previously trained model used for identifying other types of animals (such as dogs, pigs or the like) can be used as the model matching the user's predicted target.”
Regarding claim 11:
Huang teaches “the method according to claim 10.”
Huang further teaches “wherein the search task comprises one of the following: searching for pictures with texts; searching for texts with texts; searching for pictures with pictures; searching for texts with pictures; and searching for sounds with sounds”: Huang, paragraph 0032, “Therefore, selecting a model matching the user's predicted target may refer to selecting the model matching the user's predicted target from previously trained models. The predicted target refers to predicted functions achieved by the model trained based on the user's desires. For example, in a case a function achieved by the model trained based on the user's desires is identifying cats in an image, the predicted target is ‘identifying cats in an image’ [wherein the search task comprises one of the following: searching for pictures with texts]. The model matching the user's predicted target refers to a model that can achieve the same or similar functions as the predicted target. For example, in a case that the user's predicted target is ‘identifying cats in an image’, a previously trained model that is used for identifying cats in an image may be used as the model matching the user's predicted target, or a previously trained model used for identifying other types of animals (such as dogs, pigs or the like) can be used as the model matching the user's predicted target.”
Regarding claim 20:
Huang teaches “the method according to claim 3.”
Huang further teaches “wherein the task comprises a search task, and the target deep learning model comprises a deep learning model for a neural search”: Huang, paragraph 0032, “Therefore, selecting a model matching the user's predicted target may refer to selecting the model matching the user's predicted target from previously trained models. The predicted target refers to predicted functions achieved by the model trained based on the user's desires. For example, in a case a function achieved by the model trained based on the user's desires is identifying cats in an image, the predicted target is ‘identifying cats in an image’ [wherein the task comprises a search task]. The model matching the user's predicted target refers to a model that can achieve the same or similar functions as the predicted target [the target deep learning model comprises a deep learning model for a neural search]. For example, in a case that the user's predicted target is ‘identifying cats in an image’, a previously trained model that is used for identifying cats in an image may be used as the model matching the user's predicted target, or a previously trained model used for identifying other types of animals (such as dogs, pigs or the like) can be used as the model matching the user's predicted target.”
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 4-5 rejected under 35 U.S.C. 103 over Huang in view of Sankaran et al., US Pre-Grant Publication No. 2020/0074347 (hereafter Sankaran).
Regarding claim 4:
Huang teaches “the method according to claim 1.”
Huang does not explicitly teach “wherein the instruction further comprises at least one of the following: a model type of the first deep learning model; a total number of layers of the first deep learning model; a layer number of an output layer of the first deep learning model; and training parameters for training the first deep learning model.”
Sankaran teaches “wherein the instruction further comprises at least one of the following: a model type of the first deep learning model; a total number of layers of the first deep learning model; a layer number of an output layer of the first deep learning model; and training parameters for training the first deep learning model”: Sankaran, paragraph 0031, “In some embodiments, the ingested data may include user constraints 208”; Sankaran, paragraph 0034, “In some embodiments, user constraints 208 may specify a maximum training time for a deep learning model. For example, user constraints 208 may specify the maximum training time that a model can take for training [wherein the instruction further comprises at least one of the following: … training parameters for training the first deep learning model].”
Sankaran and Huang are analogous arts as they are both related to user-suggested data modelling. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user model constraints of Sankaran with the teachings of Huang to arrive at the present invention, in order to allow the user to get a model trained in a maximum time, as stated in Sankaran, paragraph 0034, “In some embodiments, user constraints 208 may specify a maximum training time for a deep learning model. For example, user constraints 208 may specify the maximum training time that a model can take for training.”
Regarding claim 5:
Huang as modified by Sankaran teaches “the method according to claim 1.”
Sankaran further teaches “wherein the training parameters comprise at least one of the following: a learning rate; and a training stop condition”: Sankaran, paragraph 0034, “In some embodiments, user constraints 208 may specify a maximum training time for a deep learning model. For example, user constraints 208 may specify the maximum training time that a model can take for training [wherein the training parameters comprise at least one of the following: a learning rate].”
Sankaran and Huang are combinable for the rationale given under claim 4.
Claims 6 and 16 rejected under 35 U.S.C. 103 over Huang in view of Ormerod, US Pre-Grant Publication No. 2019/0266234 (hereafter Ormerod).
Regarding claim 6:
Huang teaches “the method according to claim 1.”
Huang does not explicitly teach “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model.”
Ormerod teaches “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model”: Ormerod, paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”
Ormerod and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the explicit determination of a loss function optimizer of Ormerod with the teachings of Huang to arrive at the present invention, in order to improve training, as stated in Ormerod , paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”
Regarding claim 16:
Huang teaches “the method according to claim 2.”
Huang does not explicitly teach “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model.”
Ormerod teaches “determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model”: Ormerod, paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”
Ormerod and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the explicit determination of a loss function optimizer of Ormerod with the teachings of Huang to arrive at the present invention, in order to improve training, as stated in Ormerod , paragraph 0042, “With a deep network, choosing an appropriate loss function and/or optimizer can help ensure the network yields sensible results [determining a loss function and an optimizer corresponding to the first deep learning model, wherein the loss function and optimizer are configured to train the first deep learning model]. Based on experimentation with various loss functions-such as sum of squares due to error (SSE), log SSE, binary-cross-entropy-as well as various optimizers such as adaptive moment estimation (ADAM), adaptive learning rate (ADADELTA), stochastic gradient descent (SGD), and RMSPROP-the combination of a balanced weighted binary-cross-entropy and the RMSprop optimizer with backpropagation were found particularly useful and appropriately converged. For example, the system converged in between 1000 and 3000 training epochs. A standard training time of 3000 epochs was an acceptable option.”
Claims 7 and 17 rejected under 35 U.S.C. 103 over Huang in view of Xia et al., US Patent No. 10,810,491 (hereafter Xia).
Regarding claim 7:
Huang teaches “the method according to claim 1.”
Huang does not explicitly teach “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model.”
Xia teaches “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model”: Xia, col. 7, line 64 through col. 8, line 6, “In some implementations clients 174 may use interactive control elements of the interface (e.g., by clicking on a portion of a model layout) to indicate the particular layer or feature they wish to inspect visually, to zoom in on a particular iteration's details, and so on. In at least some embodiments, the visualizations 185 may be provided in real time or near real time-for example, within a few seconds of the completion of a particular training iteration, the value of the loss function [displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model].”
Xia and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training data display of Xia with the teachings of Huang to arrive at the present invention, in order to provide the user with timely insight into the training process for better training results, as stated in Xia, col. 3, lines 58-63, “In some embodiments, to help provide more timely insights into the training and/or testing of a model, a machine learning service (or more generally, a machine 60 learning training/testing environment which may not necessarily be implemented as part of a provider network service) may comprise a visualization manager.”
Regarding claim 17:
Huang teaches “the method according to claim 2.”
Huang does not explicitly teach “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model.”
Xia teaches “displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model”: Xia, col. 7, line 64 through col. 8, line 6, “In some implementations clients 174 may use interactive control elements of the interface (e.g., by clicking on a portion of a model layout) to indicate the particular layer or feature they wish to inspect visually, to zoom in on a particular iteration's details, and so on. In at least some embodiments, the visualizations 185 may be provided in real time or near real time-for example, within a few seconds of the completion of a particular training iteration, the value of the loss function [displaying a value of a loss function of the first deep learning model in each round in a process of training the first deep learning model].”
Xia and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training data display of Xia with the teachings of Huang to arrive at the present invention, in order to provide the user with timely insight into the training process for better training results, as stated in Xia, col. 3, lines 58-63, “In some embodiments, to help provide more timely insights into the training and/or testing of a model, a machine learning service (or more generally, a machine 60 learning training/testing environment which may not necessarily be implemented as part of a provider network service) may comprise a visualization manager.”
Claims 8 and 18 rejected under 35 U.S.C. 103 over Huang in view of Creedon et al., US Pre-Grant Publication No. 2021/0256418 (hereafter Creedon) and Woo et al., US Patent No. 11,544,594 (hereafter Woo).
Regarding claim 8:
Huang teaches “the method according to claim 1.”
Huang does not explicitly teach:
“wherein the step of training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”
“generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”
Creedon teaches “wherein the step of training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”: Creedon, paragraph 0047-0048, “FIG. 5 is a flow chart illustrating an exemplary implementation of a machine learning model training parameter caching process 500, according to an embodiment of the disclosure. As shown in FIG. 5 the exemplary implementation of a machine learning model training parameter caching process 500 initially trains a machine learning model during step 510, using a given training dataset. During step 520, at least one parameter of the machine learning model from the training with the given training dataset is cached, and then the cached at least one parameter of the machine learning model is used during step 530 for a subsequent training of the machine learning model
(for example, with the given training dataset). The caching of step 520 is performed, for example, after each of a plurality of iterations of the training of the machine learning model [recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training].”
Creedon and Huang are analogous arts as they are both related to neural network model training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training parameter cache of Creedon with the teachings of Huang to arrive at the present invention, in order to reduce redundant training, as stated in Creedon, paragraph 0035, “In one or more embodiments, the disclosed techniques for caching the machine learning model training parameters reduces redundant retaining and GPU usage, by introducing a cache optimization at the GPU driver 420 and vGPU Manager 460 layer during the training of deep neural networks and other machine learning models.”
Woo teaches “generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”: Woo, col. 3, lines 4-9, “According to various embodiments of the present disclosure, the initial setting user inputs may further include determining parameter and structure provided based on the selected AI algorithm, and the parameter may include the number of layers for the AI algorithm, node for each of layers, function, and the number of iterations [in response to receiving a user's selection of a number of training rounds of the first deep learning model]”; Woo, col. 21, lines 56-62, “The AI platform 840 may receive necessary information for initializing the AI algorithm (S823). The AI platform 840 may receive an input dataset from a user device 910. AI training is performed by iterating through the input dataset and updating weights of AI algorithm connections [generating, … the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model]. Each input data instance may reinforce the behavior of the AI algorithm response (S824).”
Woo and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-specified iteration of Woo with the teachings of Huang to arrive at the present invention, in order to provide non-expert users control over the training process, as stated in Woo, col. 2, lines 1-9, “The various embodiments of the present disclosure are devised to providing an electronic device, a service providing server, and a system providing a service of user-participating-type AI (Artificial Intelligence) training. According to various embodiments, without complex coding for AI training by a user who is even not an expert, the AI training is simply performed by receiving user inputs through a predetermined platform resulting in improving user-convenience.”
Regarding claim 18:
Huang teaches “the method according to claim 2.”
Huang does not explicitly teach:
“wherein the step of training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”
“generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”
Creedon teaches “wherein the step of training the first deep learning model with the training data to obtain the target deep learning model comprises: recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training”: Creedon, paragraph 0047-0048, “FIG. 5 is a flow chart illustrating an exemplary implementation of a machine learning model training parameter caching process 500, according to an embodiment of the disclosure. As shown in FIG. 5 the exemplary implementation of a machine learning model training parameter caching process 500 initially trains a machine learning model during step 510, using a given training dataset. During step 520, at least one parameter of the machine learning model from the training with the given training dataset is cached, and then the cached at least one parameter of the machine learning model is used during step 530 for a subsequent training of the machine learning model
(for example, with the given training dataset). The caching of step 520 is performed, for example, after each of a plurality of iterations of the training of the machine learning model [recording a training history of the first deep learning model in a process of training the first deep learning model, wherein the training history comprises model parameters of the first deep learning model obtained after each round of training].”
Creedon and Huang are analogous arts as they are both related to neural network model training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the training parameter cache of Creedon with the teachings of Huang to arrive at the present invention, in order to reduce redundant training, as stated in Creedon, paragraph 0035, “In one or more embodiments, the disclosed techniques for caching the machine learning model training parameters reduces redundant retaining and GPU usage, by introducing a cache optimization at the GPU driver 420 and vGPU Manager 460 layer during the training of deep neural networks and other machine learning models.”
Woo teaches “generating, in response to receiving a user's selection of a number of training rounds of the first deep learning model, the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model”: Woo, col. 3, lines 4-9, “According to various embodiments of the present disclosure, the initial setting user inputs may further include determining parameter and structure provided based on the selected AI algorithm, and the parameter may include the number of layers for the AI algorithm, node for each of layers, function, and the number of iterations [in response to receiving a user's selection of a number of training rounds of the first deep learning model]”; Woo, col. 21, lines 56-62, “The AI platform 840 may receive necessary information for initializing the AI algorithm (S823). The AI platform 840 may receive an input dataset from a user device 910. AI training is performed by iterating through the input dataset and updating weights of AI algorithm connections [generating, … the first deep learning model trained for the number of training rounds according to the model parameters corresponding to the number of training rounds; and determining the generated first deep learning model as the target deep learning model]. Each input data instance may reinforce the behavior of the AI algorithm response (S824).”
Woo and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-specified iteration of Woo with the teachings of Huang to arrive at the present invention, in order to provide non-expert users control over the training process, as stated in Woo, col. 2, lines 1-9, “The various embodiments of the present disclosure are devised to providing an electronic device, a service providing server, and a system providing a service of user-participating-type AI (Artificial Intelligence) training. According to various embodiments, without complex coding for AI training by a user who is even not an expert, the AI training is simply performed by receiving user inputs through a predetermined platform resulting in improving user-convenience.”
Claims 9 and 19 rejected under 35 U.S.C. 103 over Huang in view of Lee, US Pre-Grant Publication No. 2024/0143611 (hereafter Lee).
Regarding claim 9:
Huang teaches “the method according to claim 1.”
Huang does not explicitly teach “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format.”
Lee teaches “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format”: Lee, paragraphs 0163-0165, “Referring to FIG. 7, the converter unit 360 may import the network structure and model data defined in the ONNX model format into the network model format of the database. Conversely, the converter unit 360 may export the network model from the database into a structured format including an ONNX model, or a CSV file. The converter unit 360 may convert ONNX, NNEF, and hyperparameter and learning parameter files into structured formats other than the ONNX model format. The user may convert the converted ONNX model and structured format into a desired target framework for use [wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format].”
Lee and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-selected format of Lee with the teachings of Huang to arrive at the present invention, to allow the user to uses the model in various selected formats, as stated in Lee, paragraph 0166, “Through conversion operations using the converter unit 360, the user may apply the network model to another type of deep learning framework. This may allow the DB server 10 to invoke the model stored in the database in the relational data format and apply the model to a dataset of a similar form.”
Regarding claim 19:
Huang teaches “the method according to claim 2.”
Huang does not explicitly teach “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format.”
Lee teaches “wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format”: Lee, paragraphs 0163-0165, “Referring to FIG. 7, the converter unit 360 may import the network structure and model data defined in the ONNX model format into the network model format of the database. Conversely, the converter unit 360 may export the network model from the database into a structured format including an ONNX model, or a CSV file. The converter unit 360 may convert ONNX, NNEF, and hyperparameter and learning parameter files into structured formats other than the ONNX model format. The user may convert the converted ONNX model and structured format into a desired target framework for use [wherein the instruction further comprises a target format of the target deep learning model, and the method further comprises: converting a format of the first deep learning model into the target format].”
Lee and Huang are analogous arts as they are both related to neural network training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the user-selected format of Lee with the teachings of Huang to arrive at the present invention, to allow the user to uses the model in various selected formats, as stated in Lee, paragraph 0166, “Through conversion operations using the converter unit 360, the user may apply the network model to another type of deep learning framework. This may allow the DB server 10 to invoke the model stored in the database in the relational data format and apply the model to a dataset of a similar form.”
Claim 12 rejected under 35 U.S.C. 103 over Huang in view of Tang et al., US Pre-Grant Publication No. 2017/0004454 (hereafter Tang).
Huang teaches “the method according to claim 10.”
Huang further teaches (bold only) “wherein the step of generating the training data from the original data comprises: determining a type of the original data, wherein the type of the original data comprises categorical data with label, session data with label, and data without label, a label of the categorical data indicates a category of the categorical data, and a label of the session data indicates a question-answer relevance of the session data; and generating the training data according to the type of the original data”: Huang, paragraph 0104-0105, “As an example, task types may be set based on problem type that the user desires to solve, and different task types correspond to different problem classifications. For a problem related to image data, the tasks can include image classification tasks, object recognition tasks, text recognition tasks, image segmentation tasks, and feature point detection tasks. The image classification refers to distinguishing different image categories based on the semantic information of the image. Image classification is an important basic problem in computer vision. Different image categories are distinguished based on the semantic information of the image and labeled with different categories [categorical data with label][a label of the categorical data indicates a category of the categorical data]”; Huang, paragraph 0028, “The data obtained at block S110 may have or not have a labelling result. A method for acquiring the labelling result is not limited in embodiments of the disclosure. That is, the data can be labelled with the labelling result in any way. The labelling result can be an objective and real labelling conclusion or a subjective result of manually labelling. In a case that the data acquired at block S110 has a labelling result, the labelling result of the data can be directly obtained. In a case that the data obtained at block S110 has no labelling result [data without label] or a part of the data obtained at block has no labeling result, the data may be labelled to obtain the labelling result of the data [determining a type of the original data … and generating the training data according to the type of the original data].”
Huang does not explicitly teach (bold only) “wherein the step of generating the training data from the original data comprises: determining a type of the original data, wherein the type of the original data comprises categorical data with label, session data with label, and data without label, a label of the categorical data indicates a category of the categorical data, and a label of the session data indicates a question-answer relevance of the session data; and generating the training data according to the type of the original data.”
Tang teaches (bold only) “wherein the step of generating the training data from the original data comprises: determining a type of the original data, wherein the type of the original data comprises categorical data with label, session data with label, and data without label, a label of the categorical data indicates a category of the categorical data, and a label of the session data indicates a question-answer relevance of the session data; and generating the training data according to the type of the original data”: Tang, paragraph 0014, “A recommendation system is configured to match member profiles with job postings [that is, a kind of a question-answer session], so that those job postings that have been identified as potentially being of interest to a member represented by a particular member profile are presented to the member on a display device for viewing”; Tang, paragraph 0033, “Returning to FIG.2, the rank scores calculated by the learning to rank model 222 are assigned to the items in the list of recommended jobs 240. A list of recommended jobs with respective assigned rank scores 260 is provided to the training data collector 230. The training data collector 230 monitors events with respect to how the member, for whom the list of recommended jobs was generated, interacts with the associated job postings and, based on the monitors interactions, assigns relevance labels to the items in the list [session data with label … a label of the session data indicates a question-answer relevance of the session data]. As explained above, a job posting that is impressed and clicked by the associated member receives a different relevance score from a relevance label assigned to a job posting that was impressed but not clicked by the associated member. A list of recommended jobs with respective assigned relevance labels 270 is provided to a repository of training data 280. The training data stored in the database 280 is used to train the learning to rank model 222.”
Tang and Huang are analogous arts as they are both related to model training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the relevancy training data of Tang with the teachings of Huang to arrive at the present invention, in order to provide the most relevant model results, as stated in Tang, paragraph 0015, “Those job postings, for which their respective relevance values for a particular member profile are equal to or greater than a predetermined threshold value, are presented to that particular member, e.g., on the news feed page of the member or on some other page provided by the on-line social networking system.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Okatmato, US Pre-Grant Publication No. 2021/0264264, discloses a method of selecting a model from a plurality of machine-learning models to solve a task, which includes a specified constraint that determines whether both portions of a combined model are used to perform learning.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINCENT SPRAUL whose telephone number is (703) 756-1511. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL HUNTLEY can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/VAS/
Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129