Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
Regarding to claims 1-13
Claim 1
A system for facilitating model generation, the system comprising:
one or more processors and non-transitory media storing instructions that, when executed, cause operations comprising:
a. receiving a machine learning model;
b. performing hyperparameter tuning of the received machine learning model according to a fixed training hyperparameter to obtain a tuned machine learning model; and
c. generating a brittleness score of the tuned machine learning model based on (i) a percent of training runs that reach a converge outcome according to one or more training criteria or (ii) a variance of architectural hyperparameters.
Step 1, This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. The claim recites a system to perform at least one step or act, including steps a) - c). Thus, the claim is to a process, which is one of the statutory categories of invention. (Step 1: YES).
Step 2A – Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim.
Step c. “generating a brittleness score of the tuned machine learning model based on (i) a percent of training runs that reach a converge outcome according to one or more training criteria or (ii) a variance of architectural hyperparameters.” A person evaluates criteria such as i) and ii) to give a score for the tuned machine learning model. This step is nothing more than observations, evaluations, judgments that can be performed in human mind (i.e., a mental process [Wingdings font/0xF3] abstract idea). Again, the machine leaning model is recited at high level of generality and as a tool to perform an abstract idea.
“Unless it is clear that a claim recites distinct exceptions, such as a law of nature and an abstract idea, care should be taken not to parse the claim into multiple exceptions, particularly in claims involving abstract ideas.” MPEP 2106.04, subsection II.B. However, if possible, the examiner should consider the limitations together as a single abstract idea rather than as a plurality of separate abstract ideas to be analyzed individually. “For example, in a claim that includes a series of steps that recite mental steps as well as a mathematical calculation, an examiner should identify the claim as reciting both a mental process and a mathematical concept for Step 2A, Prong One to make the analysis clear on the record.” MPEP 2106.04, subsection II.B. Under such circumstances, however, the Supreme Court has treated such claims in the same manner as claims reciting a single judicial exception. Id. (discussing Bilski v. Kappos, 561 U.S. 593 (2010)). Here, steps b and c fall within the mental process grouping of abstract ideas. Limitations (b) - (c) are considered together as a single abstract idea for further analysis. (Step 2A, Prong One: YES).
The claim recites the additional elements/limitations: “processor”, “a non-transitory media”, “machine learning model”, “fix training hyperparameter”, “tuned machine learning model”, “converge outcome”, “criteria”, “architectural hyperparameters”, and a. receiving a machine learning model, and b. performing hyperparameter tuning of the received machine learning model according to a fixed training hyperparameter to obtain a tuned machine learning model; and
a) MPEP § 2106.05(a) "Improvements to the Functioning of a Computer or to Any Other Technology or Technical Field."
There is no improvement to Functioning of a Computer or to Any Other Technology or Technical Field. The limitation “processor”, “a non-transitory media”, “machine learning model”, “fix training hyperparameter”, “tuned machine learning model”, “converge outcome”, “criteria”, “architectural hyperparameters”, and a. receiving a machine learning model, i.e., collecting data, and b. perform hyperparameter tuning… is a process of training and/or re-training of machine learning model do not make any improvements to the functionalities of a computer, database technology, or any other technologies.
b) MPEP § 2106.05(b) Particular Machine. The judicial exception does not apply to any particular machine.
The claim is silent regarding specific limitations directed to an improved computer system, processor, memory, network, database, or Internet, nor do applicant direct examiner’s attention to such specific limitations. "[T]he mere recitation of a generic computer cannot transform a patent-ineligible abstract idea into a patent-eligible invention." Alice, 573 U.S. at 223; see also Bascom Glob. Internet Servs., Inc. v. AT&T Mobility LLC, 827 F.3d 1341, 1348 (Fed. Cir. 2016) ("An abstract idea on 'an Internet computer network' or on a generic computer is still an abstract idea."). Applying this reasoning here, the claim is not directed to a particular machine, but rather merely implement an abstract idea using generic computer components such as “processor”, “a non-transitory media”, “machine learning model”, “fix training hyperparameter”, “tuned machine learning model”, “converge outcome”, “criteria”, “architectural hyperparameters”, a. receiving a machine learning model, and b. perform hyperparameter tuning… is a normal process of training and/or re-training of machine learning model. Thus, the claims fail to satisfy the "tied to a particular machine" prong of the Bilski machine-or-transformation test.
c) MPEP § 2106.05(c) Particular Transformation.
The claim operates to collecting a machine learning model, tuning, i.e., manipulating hyperparameter of the model, and give a score to the tuned model. The steps are not a "transformation or reduction of an article into a different state or thing constituting patent-eligible subject matter[.]" See In re Bilski, 545 F.3d 943, 962 (Fed. Cir. 2008) (en bane), aff'd sub nom, Bilski v. Kappas, 561 U.S. 593 (2010); see also CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1375 (Fed. Cir. 2011) ("The mere manipulation or reorganization of data ... does not satisfy the transformation prong."). Applying this guidance here, the claims fail to satisfy the transformation prong of the Bilski machine-or-transformation test.
d) MPEP § 2106.05(e) Other Meaningful Limitations.
This section of the MPEP guides: Diamond v. Diehr provides an example of a claim that recited meaningful limitations beyond generally linking the use of the judicial exception to a particular technological environment. 450 U.S. 175, ... (1981). In Diehr, the claim was directed to the use of the Arrhenius equation ( an abstract idea or law of nature) in an automated process for operating a rubber-molding press. 450 U.S. at 177-78 .... The Court evaluated additional elements such as the steps of installing rubber in a press, closing the mold, constantly measuring the temperature in the mold, and automatically opening the press at the proper time, and found them to be meaningful because they sufficiently limited the use of the mathematical equation to the practical application of molding rubber products. 450 U.S. at 184... In contrast, the claims in Alice Corp. v. CLS Bank International did not meaningfully limit the abstract idea of mitigating settlement risk. 573 U.S._ .... In particular, the Court concluded that the additional elements such as the data processing system and communications controllers recited in the system claims did not meaningfully limit the abstract idea because they merely linked the use of the abstract idea to a particular technological environment (i.e., "implementation via computers") or were well-understood, routine, conventional activity. MPEP § 2106.05(e). The limitations “processor”, “a non-transitory media”, “machine learning model”, “fix training hyperparameter”, “tuned machine learning model”, “converge outcome”, “criteria”, “architectural hyperparameters”, a. receiving a machine learning model, and b. perform hyperparameter tuning… is a normal process of training and/or re-training of machine learning model are not meaningful limitations because are pre-solution activities. The limitations are not meaningful limitations.
e) MPEP § 2106.05(g) Insignificant Extra-Solution Activity.
The limitations a. receiving a machine learning model and b. perform hyperparameter tuning… is a process of training and/or re-training of machine learning model is not meaningful limitations because collecting and training are pre-solution activities.
f) MPEP § 2106.05(h) Field of Use and Technological Environment.
[T]he Supreme Court has stated that, even if a claim does not wholly pre-empt an abstract idea, it still will not be limited meaningfully if it contains only insignificant or token pre- or post-solution activity-such as identifying a relevant audience, a category of use, field of use, or technological environment. Ultramercial, Inc. v. Hulu, LLC, 722 F.3d 1335, 1346 (Fed. Cir. 2013). Limitations “processor”, “a non-transitory media”, “machine learning model”, “fix training hyperparameter”, “tuned machine learning model”, “converge outcome”, “criteria”, “architectural hyperparameters” are simply a field of use that attempts to limit the abstract idea to a particular technological environment.
Accordingly, the additional limitations “processor”, “a non-transitory media”, “machine learning model”, “fix training hyperparameter”, “tuned machine learning model”, “converge outcome”, “criteria”, “architectural hyperparameters”, a. receiving a machine learning model, and b. perform hyperparameter tuning… is a process of training and/or re-training of machine learning model do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim does not recite any non-convention or non-generic arrangement because a. receiving a machine learning model [Wingdings font/0xF3] collecting data, and b. perform hyperparameter tuning is a process of training a machine model. Taking these limitations as an ordered combination adds nothing that is not already present when the elements are taken individually. Therefore, the claim does not amount to significantly more than the recited abstract idea. The claim is not patent eligible.
Claim 2 recites “identifying the variance of the architectural hyperparameters based on the hyperparameter tuning of the machine learning model, wherein generating the brittleness score of the tuned machine learning model comprises generating the brittleness score of the tuned machine learning model based on the variance of the architectural hyperparameters.” High or low variance of hyperparameters can be observed by a person and a score is given based the observations. The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claim 3 recites “in connection with receiving the machine learning model, receiving a request to generate a model; and generating the model based on the brittleness score of the tuned machine learning model and a brittleness score of a different model.” Receiving a request to generate a model is simply collecting data and generating a model based on the brittleness score… is recited at high level generality. The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claim 4 recites “wherein the received request includes an instruction to generate a random parameter seed, one or more grids of parameter seeds, or one or more predetermined numbers of parameter seeds.” Received request includes… is collecting data. The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claim 5 recites “generating the model includes training the model and setting a hyperparameter of the model” Training and setting hyperparameters of the model does not include any non-convention or non-generic arrangement. The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claim 6 recites “wherein the brittleness score is associated with the variance of the architectural hyperparameters and a degree of accuracy” The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claims 7, 8, 11, 12, 13, 14, 17-20 are similar to claims 1-6. The claims are rejected based on the same reasons.
Claim 9 recites “wherein generating the brittleness score of the machine learning model comprises generating the brittleness score of the machine learning model based on the percent of training runs that reach the converge outcome.” Generating score can be performed in human mind. The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claim 10 recites “wherein the architectural hyperparameters include a number and type of one or more layers in a convolutional neural network (CNN)” The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claim 15 recites “wherein generating the brittleness score of the machine learning model comprises generating the brittleness score of the machine learning model based on a percent of training runs that reach a converge outcome according to one or more training criteria” Generating score can be performed in human mind. The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Claim 16 recites “wherein the architectural hyperparameters include a number and type of one or more layers in a convolutional neural network (CNN).” The claim does not have any addition limitation that amount to significantly more than the abstract idea.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
“A later patent claim is not patentably distinct from an earlier patent claim if the later claim is obvious over, or anticipated by, the earlier claim. In re Longi, 759 F.2d at 896, 225 USPQ at 651 (affirming a holding of obviousness-type double patenting because the claims at issue were obvious over claims in four prior art patents); In re Berg, 140 F.3d at 1437, 46 USPQ2d at 1233 (Fed. Cir. 1998) (affirming a holding of obviousness-type double patenting where a patent application claim to a genus is anticipated by a patent claim to a species within that genus). “ ELI LILLY AND COMPANY v BARR LABORATORIES, INC., United States Court of Appeals for the Federal Circuit, ON PETITION FOR REHEARING EN BANC (DECIDED: May 30, 2001).
Claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 7 of U.S. Patent No. 11836537. Although the claims at issue are not identical, they are not patentably distinct from each other because Claim(s) 7 of patent # 11836537 contain(s) every element of claim(s) 1 of the instant application and as such anticipate(s) claim(s) 1 of the instant application.
Instant Application: 19174894
Patent: 11836537
Claim 1
A system for facilitating model generation, the system comprising:
one or more processors and non-transitory media storing instructions that, when executed, cause operations comprising:
receiving a machine learning model;
performing hyperparameter tuning of the received machine learning model according to a fixed training hyperparameter to obtain a tuned machine learning model; and
generating a brittleness score of the tuned machine learning model based on (i) a percent of training runs that reach a converge outcome according to one or more training criteria or
(ii) a variance of architectural hyperparameters.
Claim 7
A system for generating a preferred model, the system comprising:
at least one hardware memory storing instructions; and
one or more hardware processors that execute the instructions to perform operations comprising:
receiving a model characteristic of a received model, the received model comprising a machine learning model;
classifying the received model based on the model characteristic;
identifying a fixed training hyperparameter based on the classification and architectural hyperparameters based on the classification;
generating a tuned model by performing hyperparameter tuning of the received model according to the fixed training hyperparameter;
determining a brittleness score of the tuned model based on a percent of training runs that reach a converge outcome based on one or more training criteria;
comparing the brittleness score of the tuned model to a brittleness score of a different model; and generating the preferred model based on the comparison and the tuned model or the different model
wherein the brittleness score is determined based on a variance of the architectural parameters.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 6, 7,8, 9, 10, 13 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (U.S. Pub 2020/0302292 A1), in view of Koch (U.S. Pub 2018/0240041A1), in view of Abbott (U.S. Pub 2019/0108421 A1)
Claim 1
Tseng discloses a system for facilitating model generation, the system comprising (fig. 2, apparatus 200, fig. 5):
one or more processors (fig. 5, processing apparatus 502) and non-transitory media storing instructions that (fig. 5, memory 604), when executed, cause operations comprising:
receiving a machine learning model ([0052], “... provide model information 204 defining one or more models to the user device for storage. The model information 204 may define parameters of the model. The parameters may include hyperparameters for the model such as a number of layers in the model, a number of kernels in each of the layers, identifiers of the orthogonal binary basis vectors used to represent the kernels, or any combination thereof. In addition, the parameters may include a set of coefficients for each kernel. In some examples, the model information may also include information describing the performance of the model, such as accuracy and/or computational resource use... [0059], line 4-7, “... The plurality of neural network models may be referred to as a candidate list and may have been received from the neural network training apparatus 200...”);
performing hyperparameter tuning of the received machine learning model according to a fixed training hyperparameter to obtain a tuned machine learning model ([0089], line 1-4, “... the apparatus 200 selects the sets of hyperparameters from the model grid iteratively, and trains a neural network based on each set of hyperparameters...” <examiner note: the neural network is trained using hyperparameter. The trained neural network <=> tuned model>; fixed hyperparameters [Wingdings font/0xF3] number of layers, kernels unchanged)
However, Tseng does not explicitly disclose
generating a brittleness score of the tuned machine learning model based on (i) a percent of training runs that reach a converge outcome according to one or more training criteria or (ii) a variance of architectural hyperparameters.
Koch discloses generating a brittleness score of the tuned machine learning model based on a variance of architectural hyperparameters ([0075], line 1-3, “… a sixth indicator of a model type for which to identify a best hyperparameter configuration may be received…” table 1, pg. 11 discloses architectural hyperparameters for different type of models [0205], “… an operation 716, the model type is scored using the hyperparameter configuration accessed, the trained model defined in operation 710… to determine one or more objective function values…” [0181], ,line 1-6, “… In operation 652, the results for each hyperparameter configuration included in configuration list 322 is provided to iteration manager 314. Based on the results and the current tuning search method(s), iteration manager 314 determines a next set of hyperparameter configurations to evaluate in a next iteration…” fig. 12 show variance of architecture hyperparameters…”)
Tseng discloses a neural network model is trained using hyperparameters; however, Tseng does not disclose scoring the tuned/trained model based on architectural hyperparameters. Koch discloses architectural hyperparameters for each type of model are automatically searched and evaluated to identify the best hyperparameter configurations for the model with highest performance. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include hyperparameter tuning as disclosed by Koch into Tseng to identify the best set of architectural hyperparameters so that the tuned/trained model to have the best performance.
Abbot discloses generating a brittleness score of the tuned machine learning model based on a percent of training runs that reach a converge outcome according to one or more training criteria ([0076], “… the training process can be used to determine values for hyperparameters of the model being used… This procedure is sometimes referred to as model selection… The performance the specific choice of the hyperparameters can evaluated by averaging the accuracy of these k models…” [0078], “… At 416, process 400 can determine the accuracy of the trained model over the test samples, and can determine whether to discontinue training (e.g., because a particular level of accuracy has been reached, because the accuracy has not improved by a particular amount over a particular number of epochs…” [0096], “… FIG. 13, an accuracy of 95% was achieved if 80% of the data (i.e., 312 micro-wells or 60,064 images) were used as training samples, whereas, if only 20% of the data (i.e., 79 micro-wells or 15,017 images) were used as training samples, the accuracy dropped to 87.47%...” <examiner note: for instance, the accuracy/score of the model is 95% when 80% training runs and the accuracy has not improved over a particular number of epochs [Wingdings font/0xF3] converge outcome according to training criteria>)
PNG
media_image1.png
270
922
media_image1.png
Greyscale
Tseng discloses a neural network model is trained using hyperparameters; however, Tseng does not disclose scoring the tuned/trained model based on a percent of training runs that reach a converge outcome according to one or more training criteria. Abbott discloses the training process can be used to determine values for hyperparameters of the model being used (e.g., λ for logistic classification, C for linear SVM, C and γ for nonlinear SVM, λ and network layout for ANN). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the training/tuning process as disclosed by Abbott in to Tseng so that the optimal hyperparameters (of those calculated) can be selected by looping over different hyperparameter choices.
Claim 7 is similar to claim 1. The claim is rejected based on the same reason.
Claim 2
Claim 1 is included, Koch discloses identifying the variance of the architectural hyperparameters based on the hyperparameter tuning of the machine learning model ([0075], line 1-3, “… a sixth indicator of a model type for which to identify a best hyperparameter configuration may be received…” table 1, pg. 11 discloses architectural hyperparameters for different type of models), wherein generating the brittleness score of the tuned machine learning model comprises generating the brittleness score of the tuned machine learning model based on the variance of the architectural hyperparameters ([0205], “… an operation 716, the model type is scored using the hyperparameter configuration accessed, the trained model defined in operation 710… to determine one or more objective function values…” [0181], ,line 1-6, “… In operation 652, the results for each hyperparameter configuration included in configuration list 322 is provided to iteration manager 314. Based on the results and the current tuning search method(s), iteration manager 314 determines a next set of hyperparameter configurations to evaluate in a next iteration…” fig. 12 show variance of architecture hyperparameters…”)
Claim 8 is similar to claim 2. The claim is rejected based on the same reason.
Claim 6
Claim 1 is included, Koch discloses wherein the brittleness score is associated with the variance of the architectural hyperparameters and a degree of accuracy (fig. 13, Misclassification error percentage 1.74 or accuracy 98.26)
Claim 13 is similar to claim 6. The claim is rejected based on the same reason
Claim 9
Claim 7 is included, Abbott discloses wherein generating the brittleness score of the machine learning model comprises generating the brittleness score of the machine learning model based on the percent of training runs that reach the converge outcome ([0076], “… the training process can be used to determine values for hyperparameters of the model being used… This procedure is sometimes referred to as model selection… The performance the specific choice of the hyperparameters can evaluated by averaging the accuracy of these k models…” [0078], “… At 416, process 400 can determine the accuracy of the trained model over the test samples, and can determine whether to discontinue training (e.g., because a particular level of accuracy has been reached, because the accuracy has not improved by a particular amount over a particular number of epochs…” [0096], “… FIG. 13, an accuracy of 95% was achieved if 80% of the data (i.e., 312 micro-wells or 60,064 images) were used as training samples, whereas, if only 20% of the data (i.e., 79 micro-wells or 15,017 images) were used as training samples, the accuracy dropped to 87.47%...” <examiner note: for instance, the accuracy/score of the model is 95% when 80% training runs and the accuracy has not improved over a particular number of epochs [Wingdings font/0xF3] converge outcome according to training criteria>)
Claim 10
Claim 7 is included, Tseng discloses wherein the architectural hyperparameters include a number and type of one or more layers in a convolutional neural network (CNN) ([0052], “... provide model information 204 defining one or more models to the user device for storage. The model information 204 may define parameters of the model. The parameters may include hyperparameters for the model such as a number of layers in the model, a number of kernels in each of the layers…” <examiner note: number of layers, and kernels are hyperparameters for a neural network that includes CNN>)
Claim(s) 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (U.S. Pub 2020/0302292 A1), in view of Koch (U.S. Pub 2018/0240041A1), in view of Abbott (U.S. Pub 2019/0108421 A1), as applied to claim 1 and 7 respectively, and further in view of Li (U.S. Pub 2016/0078339 A1)
Claim 3
Claim 1 is included, Li discloses in connection with receiving the machine learning model, receiving a request to generate a model ([0036], “... Evaluating component 128... for evaluating the student DNN model... evaluating component 128 evaluates the output distributions of the student and teacher DNNs... determines whether the student is continuing to improve or whether the student is no longer improving...”); and generating the model based on the brittleness score of the tuned machine learning model and a brittleness score of a different model ([0036], “... evaluating component 128 evaluates the output distributions of the student and teacher DNNs, determines the difference (which may be determined as an error signal) between the outputs and also determines whether the student is continuing to improve or whether the student is no longer improving (i.e. the student output distribution shows no further trend towards convergence with the teacher output)...” [0065], “... At step 560, the student DNN is updated based on the evaluation determined at step 550... In one embodiment, the difference between the output distribution of the student DNN and teacher DNN determined in step 550 is used to update the parameters or node weights of the student DNN, which may be performed using back propagation. Updating the student DNN in this way facilitates training the output distribution of the student DNN to more closely approximate the output distribution of the teacher DNN...” <examiner note: an updated student model is a preferred model. It is generated based on the comparison and student model or the teacher model>)
Tseng discloses a neural network model is trained and local minimum/value of loss function is calculated; however, the performance of the neural network is not compared with a reference model to determine the performance of the neural network. Li discloses the performance/output distribution of student neural network is compared with teacher model and a better student model is generated using the comparison result and student model and teacher model. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to calculate and compare performance of one model to performance of another/reference model to identify the difference/divergence between models in order to retrain the model with higher accuracy with small error rate.
Claim 11 is similar to claim 3. The claim is rejected based on the same reason
Claim(s) 4, 5, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (U.S. Pub 2020/0302292 A1), in view of Koch (U.S. Pub 2018/0240041A1), in view of Abbott (U.S. Pub 2019/0108421 A1), in view of Li (U.S. Pub 2016/0078339 A1), as applied to claim 3 and 11 respectively, and further in view of Zarandioon (U.S. Patent 10713589 B1)
Claim 4
Claim 3 is included, however, Tseng and Li do not explicitly disclose wherein the received request includes an instruction to generate a random parameter seed, one or more grids of parameter seeds, or one or more predetermined numbers of parameter seeds.
Zarandioon discloses wherein the received request includes an instruction to generate a random parameter seed (col 53, line 7-10, “… an input data set 3201… is shuffled initially using a particular seed value 3202A as a token contributor, producing a first unique shuffle result 3212A…”), one or more grids of parameter seeds (fig. 32, a grid of parameter seed 3202A-D), or one or more predetermined numbers of parameter seeds (col 53, line 28-30, “… the original input data set may be reshuffled with each new seed value 3202, while in other embodiments the output produced by the most recent shuffle may be reshuffled using the new seed…”)
Claim 5
Claim 3 is included, Tseng discloses wherein generating the model includes training the model and setting a hyperparameter of the model ([0089], line 1-4, “... the apparatus 200 selects the sets of hyperparameters from the model grid iteratively, and trains a neural network based on each set of hyperparameters...” <examiner note: the neural network is trained using hyperparameter. The trained neural network <=> tuned model>)
Tseng discloses a neural network model is trained and local minimum/value of loss function is calculated; however, the performance of the neural network is not compared with a reference model to determine the performance of the neural network. Li discloses the performance/output distribution of student neural network is compared with teacher model and a better student model is generated using the comparison result and student model and teacher model. However, Tseng and Li do not explicitly discloses generate parameter seeds. Zarandioon discloses generates random parameter seed, grid of parameter seed, and one or more predetermined number of parameters seeds. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to Zarandioon’s disclosure into Tseng and Li because “… As a result of the use of the seed value during the generation of the pseudo-random values in the sort-based record-level shuffle algorithm, entirely different shuffle results may be produced from the same data set simply by changing the seed value in various embodiments, with the additional property that the shuffle order would remain consistent for a given seed value. For some machine learning algorithms, reshuffling the same data may be appropriate for respective training epochs—e.g., a given training set may be used to perform one round of training, reshuffled, and then used to perform additional round of training, with the process being repeated until the model being trained reaches a desired quality…”
Claim 12 is similar to claim 4 and 5. The claim is rejected based on the same reason.
Claim(s) 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (U.S. Pub 2020/0302292 A1), in view of DrMad: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Network written by Jie Fu, April 6, 2016, in view of Koch (U.S. Pub 2018/0240041A1)
Claim 14
Tseng discloses one or more non-transitory media storing instructions (fig. 5, memory 604) that, when executed by one or more processors (fig. 5, processing apparatus 502), cause operations comprising:
receiving a machine learning model ([0052], “... provide model information 204 defining one or more models to the user device for storage. The model information 204 may define parameters of the model. The parameters may include hyperparameters for the model such as a number of layers in the model, a number of kernels in each of the layers, identifiers of the orthogonal binary basis vectors used to represent the kernels, or any combination thereof. In addition, the parameters may include a set of coefficients for each kernel. In some examples, the model information may also include information describing the performance of the model, such as accuracy and/or computational resource use... [0059], line 4-7, “... The plurality of neural network models may be referred to as a candidate list and may have been received from the neural network training apparatus 200...”;
performing hyperparameter tuning of the machine learning model ([0089], line 1-4, “... the apparatus 200 selects the sets of hyperparameters from the model grid iteratively, and trains a neural network based on each set of hyperparameters...” <examiner note: the neural network is trained using hyperparameter. The trained neural network <=> tuned model>)
However, Tseng does not explicitly disclose
performing hyperparameter tuning of the machine learning model to identify a variance of architectural hyperparameters; and
generating a brittleness score of the machine learning model based on the variance of the architectural hyperparameters.
Fu discloses performing hyperparameter tuning of the machine learning model to identify a variance of architectural hyperparameters (section 4.1, “… a multilayer perceptron (MLP)… The MLP has 4 layer, containing 784, 50, 50, and 50 neurons respectively… thus we are going to optimize 934 hyperparameter in total…” <examiner note: number of layers, neurons in each layer are architectural hyperparameter>… The shaded areas in figure 3 (left) represent the variances of individual hyperparameters. We can see that the variances are quite high, which implies that diverse hyperparameter value might be beneficial to the predictive performance…”)
Tseng discloses a neural network model is trained based on a set of hyperparameters and local minimum/value of loss function is calculated; however, Tseng does not disclose identifying variance of architectural hyperparameters. Fu discloses an effective method, DrMAD that identify variance of architectural parameters when tuning the model based on the hyperparameters. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method, DrMAD, as disclosed by Fu into Tseng because the DrMAD offer high memory and computational efficiency yet achieving comparable predictive performance .
Koch discloses generating a brittleness score of the machine learning model based on the variance of the architectural hyperparameters ([0205], “… an operation 716, the model type is scored using the hyperparameter configuration accessed, the trained model defined in operation 710… to determine one or more objective function values…” [0181], ,line 1-6, “… In operation 652, the results for each hyperparameter configuration included in configuration list 322 is provided to iteration manager 314. Based on the results and the current tuning search method(s), iteration manager 314 determines a next set of hyperparameter configurations to evaluate in a next iteration…” fig. 12 show variance of architecture hyperparameters…”)
Tseng discloses a neural network model is trained using hyperparameters; however, Tseng does not disclose scoring the tuned/trained model based on architectural hyperparameters. Koch discloses architectural hyperparameters for each type of model are automatically searched and evaluated to identify the best hyperparameter configurations for the model with highest performance. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include hyperparameter tuning as disclosed by Koch into Tseng and Fu to identify the best set of architectural hyperparameters so that the tuned/trained model to have the best performance.
Claim 20
Claim 14 is included, Koch discloses wherein the brittleness score is associated with the variance of the architectural hyperparameters and a degree of accuracy (fig. 13, Misclassification error percentage 1.74 or accuracy 98.26)
Claim(s) 15 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (U.S. Pub 2020/0302292 A1), in view of DrMad: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Network written by Jie Fu, April 6, 2016, in view of Koch (U.S. Pub 2018/0240041A1), as applied to claim 14, and further in view of Abbott (U.S. Pub 2019/0108421 A1)
Claim 15
Claim 14 is included, however, Tseng does not explicitly disclose wherein generating the brittleness score of the machine learning model comprises generating the brittleness score of the machine learning model based on a percent of training runs that reach a converge outcome according to one or more training criteria.
Abbott discloses wherein generating the brittleness score of the machine learning model comprises generating the brittleness score of the machine learning model based on a percent of training runs that reach a converge outcome according to one or more training criteria ([0076], “… the training process can be used to determine values for hyperparameters of the model being used… This procedure is sometimes referred to as model selection… The performance the specific choice of the hyperparameters can evaluated by averaging the accuracy of these k models…” [0078], “… At 416, process 400 can determine the accuracy of the trained model over the test samples, and can determine whether to discontinue training (e.g., because a particular level of accuracy has been reached, because the accuracy has not improved by a particular amount over a particular number of epochs…” [0096], “… FIG. 13, an accuracy of 95% was achieved if 80% of the data (i.e., 312 micro-wells or 60,064 images) were used as training samples, whereas, if only 20% of the data (i.e., 79 micro-wells or 15,017 images) were used as training samples, the accuracy dropped to 87.47%...” <examiner note: for instance, the accuracy/score of the model is 95% when 80% training runs and the accuracy has not improved over a particular number of epochs [Wingdings font/0xF3] converge outcome according to training criteria>)
PNG
media_image1.png
270
922
media_image1.png
Greyscale
Tseng discloses a neural network model is trained using hyperparameters; however, Tseng does not disclose scoring the tuned/trained model based on a percent of training runs that reach a converge outcome according to one or more training criteria. Abbott discloses the training process can be used to determine values for hyperparameters of the model being used (e.g., λ for logistic classification, C for linear SVM, C and γ for nonlinear SVM, λ and network layout for ANN). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the training/tuning process as disclosed by Abbott in to Tseng so that the optimal hyperparameters (of those calculated) can be selected by looping over different hyperparameter choices.
Claim 16
Claim 14 is included, Tseng discloses wherein the architectural hyperparameters include a number and type of one or more layers in a convolutional neural network (CNN) ([0052], “... provide model information 204 defining one or more models to the user device for storage. The model information 204 may define parameters of the model. The parameters may include hyperparameters for the model such as a number of layers in the model, a number of kernels in each of the layers…” <examiner note: number of layers, and kernels are hyperparameters for a neural network that includes CNN>)
Claim(s) 17 is rejected under 35 U.S.C. 103 as being unpatentable over Tseng (U.S. Pub 2020/0302292 A1), in view of DrMad: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Network written by Jie Fu, April 6, 2016, in view of Koch (U.S. Pub 2018/0240041A1), as applied to claim 14, and further in view of Li (U.S. Pub 2016/0078339 A1)
Claim 17
Claim 14 is included, however, Tseng does not disclose further comprising: in connection with receiving the machine learning model, receiving a request to generate a model; and generating the model based on the brittleness score of the machine learning model and a brittleness score of a different model.
Li discloses in connection with receiving the machine learning model, receiving a request to generate a model ([0036], “... Evaluating component 128... for evaluating the student DNN model... evaluating component 128 evaluates the output distributions of the student and teacher DNNs... determines whether the student is continuing to improve or whether the student is no longer improving...”); and generating the model based on the brittleness score of the tuned machine learning model and a brittleness score of a different model ([0036], “... evaluating component 128 evaluates the output distributions of the student and teacher DNNs, determines the difference (which may be determined as an error signal) between the outputs and also determines whether the student is continuing to improve or whether the student is no longer improving (i.e. the student output distribution shows no further trend towards convergence with the teacher output)...” [0065], “... At step 560, the student DNN is updated based on the evaluation determined at step 550... In one embodiment, the difference between the output distribution of the student DNN and teacher DNN determined in step 550 is used to update the parameters or node weights of the student DNN, which may be performed using back propagation. Updating the student DNN in this way facilitates training the output distribution of the student DNN to more closely approximate the output distribution of the teacher DNN...” <examiner note: an updated student model is a preferred model. It is generated based on the comparison and student model or the teacher model>)
Tseng discloses a neural network model is trained and local minimum/value of loss function is calculated; however, the performance of the neural network is not compared with a reference model to determine the performance of the neural network. Li discloses the performance/output distribution of student neural network is compared with teacher model and a better student model is generated using the comparison result and student model and teacher model. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to calculate and compare performance of one model to performance of another/reference model to identify the difference/divergence between models in order to retrain the model with higher accuracy with small error rate.
Claim(s) 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Tseng (U.S. Pub 2020/0302292 A1), in view of DrMad: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Network written by Jie Fu, April 6, 2016, in view of Koch (U.S. Pub 2018/0240041A1), in view of Li (U.S. Pub 2016/0078339 A1), as applied to claim 17, and further in view of Zarandioon (U.S. Patent 10713589 B1)
Claim 18
Claim 17 is included, however, Tseng does not disclose wherein the received request includes an instruction to generate a random parameter seed, one or more grids of parameter seeds, or one or more predetermined numbers of parameter seeds.
Zarandioon discloses wherein the received request includes an instruction to generate a random parameter seed, one or more grids of parameter seeds, or one or more predetermined numbers of parameter seeds (col 53, line 7-10, “… an input data set 3201… is shuffled initially using a particular seed value 3202A as a token contributor, producing a first unique shuffle result 3212A…”), one or more grids of parameter seeds (fig. 32, a grid of parameter seed 3202A-D), or one or more predetermined numbers of parameter seeds (col 53, line 28-30, “… the original input data set may be reshuffled with each new seed value 3202, while in other embodiments the output produced by the most recent shuffle may be reshuffled using the new seed…”)
Tseng discloses a neural network model is trained and local minimum/value of loss function is calculated; however, the performance of the neural network is not compared with a reference model to determine the performance of the neural network. Li discloses the performance/output distribution of student neural network is compared with teacher model and a better student model is generated using the comparison result and student model and teacher model. However, Tseng and Li do not explicitly discloses generate parameter seeds. Zarandioon discloses generates random parameter seed, grid of parameter seed, and one or more predetermined number of parameters seeds. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to Zarandioon’s disclosure into Tseng and Li because “… As a result of the use of the seed value during the generation of the pseudo-random values in the sort-based record-level shuffle algorithm, entirely different shuffle results may be produced from the same data set simply by changing the seed value in various embodiments, with the additional property that the shuffle order would remain consistent for a given seed value. For some machine learning algorithms, reshuffling the same data may be appropriate for respective training epochs—e.g., a given training set may be used to perform one round of training, reshuffled, and then used to perform additional round of training, with the process being repeated until the model being trained reaches a desired quality…”
Claim 19
Claim 17 is included, Tseng discloses wherein generating the model includes training the model and setting a hyperparameter of the model ([0089], line 1-4, “... the apparatus 200 selects the sets of hyperparameters from the model grid iteratively, and trains a neural network based on each set of hyperparameters...” <examiner note: the neural network is trained using hyperparameter. The trained neural network <=> tuned model>)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAU HAI HOANG whose telephone number is (571)270-5894. The examiner can normally be reached 1st biwk: Mon-Thurs 7:00 AM-5:00 PM; 2nd biwk: Mon-Thurs: 7:00 am-5:00pm, Fri: 7:00 am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached at 571-270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
HAU HAI. HOANG
Primary Examiner
Art Unit 2154
/HAU H HOANG/Primary Examiner, Art Unit 2154