DETAILED ACTION
This communication is in response to the amendment filed 11/11/25 in which claims 1, 3, 18-20 were amended. Claims 1-20 are pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claims 1, 18, and 19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Specifically, a new reference to Anderson is used to teach the amended limitations.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1-7 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal (US 2020/0125961 A1; published Apr. 23, 2020) in view of Anderson (US 2019/0325316 A1; published Oct. 24, 2019).
Regarding claim 1, Agrawal discloses [a] method for searching for an output machine learning algorithm to perform a particular machine learning task, (Agrawal ¶ 27 (“A machine learning algorithm may be selected based on the domain of the problem and the intended type of outcome required by the problem.”)) the method comprising:
receiving a set of training examples, each training example in the set of training examples comprising an example input and an example output; (¶ 20 (“In an embodiment, to iteratively train an algorithm to generate a trained model, a training data set may be arranged such that each row of the data set is input to a machine learning algorithm and further stores the corresponding actual outcome, label value, for the row. For example, each row of the adult income data set represents a particular adult for whom the outcome is known, such as whether the adult has a gross income over $500,000. Each column of the adult training dataset contains numerical representations of a particular adult characteristic (e.g. whether an adult has a college degree, age of an adult . . . ) based on which the algorithm when trained can accurately predict whether any adult (even one who has not been described by the training data set) has a gross income over $500,000.”))
receiving a set of validation examples, each validation example in the set of validation examples comprising a validation input and a validation output; (¶ 33 (“A test data set is used as an input to the trained model for calculating the predicted result values. The predicted result values are compared with the corresponding label values to determine the performance score.”))
generating a sequence of candidate machine learning algorithms to perform the particular machine learning task, comprising searching for one or more candidate machine learning algorithms through a machine learning algorithm search space, (¶ 32 (“Therefore, to select a type of algorithm or moreover, to select the best performing variant of an algorithm, various hyper-parameter selection techniques are used to generate distinct sets of hyper-parameter values. Non-limiting examples of hyper-parameter value selection techniques include a Bayesian optimization such as a Gaussian process for hyper-parameter value selection, a random search, a gradient-based search, a grid search, hand-tuning techniques, a tree-structured Parzen Estimators (TPE) based technique.”))
wherein each candidate machine learning algorithm in the sequence is represented as a computer program comprising (i) a respective candidate setup function that initializes one or more training parameters of the candidate machine learning algorithm, (¶ 30 (“A type of machine algorithm may have unlimited variants based on one or more hyper-parameters. The term “hyper-parameter” refers to a parameter in a model artifact that is set before the training of the machine algorithm model and is not modified during the training of the model. In other words, a hyper-parameter is a constant value that affects (or controls) the generated trained model independent of the training data set. A machine learning model with a model artifact that has only hyper-parameter values set is referred to herein as a “variant of a machine learning algorithm” or simply “variant.” Accordingly, different hyperparameter values for the same type of machine learning algorithm may yield significantly different loss values on the same training data set during the training of a model.”), ¶ 32 (“Therefore, to select a type of algorithm or moreover, to select the best performing variant of an algorithm, various hyper-parameter selection techniques are used to generate distinct sets of hyper-parameter values.”)) (ii) a respective candidate learn function that adjusts training parameters of the candidate machine learning algorithm, (¶ 41 (“A reference variant is selected as a baseline variant, of the machine learning algorithm, which is modified to generate a mini-ML variant. Through iterative evaluations of variants of the reference variant, a mini-ML variant is generated, and in each iteration, at least one hyper-parameter value of the reference variant is modified.”)) and (iii) a respective candidate predict function that predicts an output for a given input using the adjusted training parameters (¶ 74 (“For any new data set, a hyper-parameter value selection or a machine algorithm model selection is performed by applying the new data set's meta-features as input to the trained meta-model, in an embodiment. The system performs cross-validation of the mini-ML variants on the new data set, and the resulting performance scores are used as the meta-feature values. The meta-feature values for the new data set, including that for the mini-ML variant(s), are provided as the input to the meta-model. The output of the meta-model indicates a predicted best-performing machine learning algorithm and/or the predicted best performing one or more hyper-parameter value(s) for the machine learning algorithm.”)).
Agrawal does not expressly disclose searching for one or more candidate machine learning algorithms through a machine learning algorithm search space and that each ML variant is represented as a computer program (but see Anderson ¶ 23 (“In the known genetic algorithm 100 of FIG. 1, certain ones of the randomly generated candidate programs are selected based on the fitness scores for use in breeding operation(s) to generate new program(s) (block 184). Candidate program(s) associated with fitness score(s) indicating that the program(s) perform better than other candidate program(s) in satisfying the target input/output data are more likely to be selected for breeding, as such programs are more likely to produce end result program(s) that satisfy the target input/output criteria. Breeding operations can include mutation in which one or more parameters (e.g., one or more instructions, one or more values, one or more variables, one or more data structures, one or more objects, one or more loops, etc.) of a selected candidate program are modified. Another example breeding operation includes crossover, in which parameter(s) e.g., one or more instructions, one or more values, one or more variables, one or more data structures, one or more objects, one or more loops, etc.) of different candidate programs are combined (e.g., swapped, merged, added, removed, etc.) to generate new program(s).”), ¶ 25 (“Thus, known program synthesis based on genetic programming includes generating random programs that are iteratively selected for use in generating new programs. There can be more than one program that provides a solution to given input/output example. Thus, as a result of program synthesis, one or more effective end result programs can be generated that satisfy the target input/output criteria. As discussed above, as used herein, an end program can be considered to be an effective end program if for all given inputs, the end result program provides corresponding desired outputs. However, other structural or timing based characteristics of an effective end result program, such as a length of the end result program, may be unknown.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agrawal to incorporate the teachings of Anderson to automatically generate the ML programs based on genetic algorithms, at least because doing so would enable creating a new program that satisfies a target input and output.
Agrawal further discloses:
for each candidate machine learning algorithm in the sequence:
setting up one or more training parameters for the candidate machine learning algorithm by executing the candidate setup function, (Agrawal ¶ 43 (“FIG. 1 is a flow diagram that depicts a process for generating metrics for a reference variant, in an embodiment. At step 100, a computer system selects a reference variant with a pre-determined set of hyperparameters for a particular domain, in an embodiment. For example, based on previous experimentation, it has been shown that a particular “C” and “gamma” values yield the best performance for a categorization domain problem involving three or fewer input types. This variant of the SVM machine learning algorithm is then selected as a reference variant to generate a mini-ML variant.”))
training the candidate machine learning algorithm to adjust the one or more training parameters by processing the set of training examples using the candidate predict function and the candidate learn function, and (Agrawal ¶ 43 (“At step 110, a training data set is selected for cross-validation of the reference variant, and at step 120, cross-validation of the reference variant on the selected data set is performed. In some embodiments, the reference variant may already be associated with pre-recorded performance and cost metrics for datasets, which are used in generating the mini-ML variant. For example, the mini-ML variant is going to be generated based on one or more of the OpenML.org datasets, and the reference variant has already been cross-validated and trained using the same data sets.”))
evaluating a performance of the trained candidate machine learning algorithm by executing the candidate predict function on the set of validation examples to determine a performance metric for the trained candidate machine learning algorithm; and (Agrawal ¶ 43 (“At step 110, a training data set is selected for cross-validation of the reference variant, and at step 120, cross-validation of the reference variant on the selected data set is performed. In some embodiments, the reference variant may already be associated with pre-recorded performance and cost metrics for datasets, which are used in generating the mini-ML variant. For example, the mini-ML variant is going to be generated based on one or more of the OpenML.org datasets, and the reference variant has already been cross-validated and trained using the same data sets.”), ¶ 44 (“In an embodiment, to obtain a performance score for a reference variant, the reference variant is cross-validated using a selected training data set that is also used for generating the mini-ML variant of the algorithm. The performance score may be computed by statistically aggregating the individual performance scores computed for each of the data set folds validation. At step 125, if other training datasets exist, the cross-validation is similarly performed using each of the other training data sets to generate corresponding performance scores for each of the other training data sets.”))
selecting a trained candidate machine learning algorithm with the best performance metric among the trained candidate machine learning algorithms as the output machine learning algorithm for the particular machine learning task (¶ 46 (“At step 135, the machine learning algorithm with the best performance score is selected as the reference variant.”)).
Claim 18 is a system claim corresponding to claim 1 and is similarly rejected. Agrawal further discloses [a] system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations for performing a particular machine learning task (¶ 87 (“For example, FIG. 6 is a block diagram that illustrates a computer system 600 upon which an embodiment of the invention may be implemented. Computer system 600 includes a bus 602 or another communication mechanism for communicating information, and a hardware processor 604 coupled with bus 602 for processing information.”), ¶ 88 (“Computer system 600 also includes a main memory 606, such as a random access memory (RAM) or another dynamic storage device, coupled to bus 602 for storing information and instructions to be executed by processor 604.”)).
Claim 19 is a computer-readable medium claim corresponding to claim 1 and is similarly rejected. Agrawal further discloses [o]ne or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for performing a particular machine learning task (¶ 92 (“The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 610.”), ¶ 94 (“Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 604 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.”)).
Regarding claim 2, Agrawal, in view of Anderson, discloses the invention of claim 1 as discussed above. Agrawal further discloses wherein the particular machine learning task is one of the following tasks: a classification task, a regression task, or an image recognition task (¶ 3 (“Selecting the best algorithm for an application may be difficult and resource intensive. For example, a classification task can be done by several algorithms such as support vector machines (SVMs), random forests, decision trees, artificial neural networks, and more.”)).
Regarding claim 3, Agrawal, in view of Anderson, discloses the invention of claim 1 as discussed above. Agrawal further discloses:
wherein the machine learning algorithm search space is defined by a sequence of instructions, wherein the sequence of instructions includes (i) instructions that specify the operations performed by a setup function of a given candidate machine learning algorithm, (Agrawal ¶ 43 (“FIG. 1 is a flow diagram that depicts a process for generating metrics for a reference variant, in an embodiment. At step 100, a computer system selects a reference variant with a pre-determined set of hyperparameters for a particular domain, in an embodiment. For example, based on previous experimentation, it has been shown that a particular “C” and “gamma” values yield the best performance for a categorization domain problem involving three or fewer input types. This variant of the SVM machine learning algorithm is then selected as a reference variant to generate a mini-ML variant.”)) (ii) instructions that specify the operations performed by a learn function of the given candidate machine learning algorithm, (Agrawal ¶ 43 (“At step 110, a training data set is selected for cross-validation of the reference variant, and at step 120, cross-validation of the reference variant on the selected data set is performed. In some embodiments, the reference variant may already be associated with pre-recorded performance and cost metrics for datasets, which are used in generating the mini-ML variant. For example, the mini-ML variant is going to be generated based on one or more of the OpenML.org datasets, and the reference variant has already been cross-validated and trained using the same data sets.”)) and (iii) instructions that specify the operations performed by a predict function of the given candidate machine learning algorithm (¶ 74 (“For any new data set, a hyper-parameter value selection or a machine algorithm model selection is performed by applying the new data set's meta-features as input to the trained meta-model, in an embodiment. The system performs cross-validation of the mini-ML variants on the new data set, and the resulting performance scores are used as the meta-feature values. The meta-feature values for the new data set, including that for the mini-ML variant(s), are provided as the input to the meta-model. The output of the meta-model indicates a predicted best-performing machine learning algorithm and/or the predicted best performing one or more hyper-parameter value(s) for the machine learning algorithm.”)).
Claim 20 is a CRM claim corresponding to claim 3 and is similarly rejected.
Regarding claim 4, Agrawal, in view of Anderson, discloses the invention of claim 3 as discussed above. Agrawal further discloses wherein each instruction comprises (i) an operation from a predetermined set of operations and (ii) a set of arguments (¶ 31 (“For example, the SVM machine learning algorithm includes two hyperparameters: “C” and “gamma.” The “C” hyper-parameter may be set to any value from 10−3 to 105, while the “gamma” hyper-parameter may be set 10−5 to 103. Accordingly, there are endless permutations of the “C” and “gamma” parameters that may yield different loss values for training the same adult income training data set.”)).
Regarding claim 5, Agrawal, in view of Anderson, discloses the invention of claim 3 as discussed above. Agrawal further discloses initializing each of the first candidate setup function, the first candidate predict function, and the first candidate predict function of the first candidate machine learning algorithm in the sequence of candidate machine learning algorithms with one or more instructions (¶ 32 (“Therefore, to select a type of algorithm or moreover, to select the best performing variant of an algorithm, various hyper-parameter selection techniques are used to generate distinct sets of hyper-parameter values. Non-limiting examples of hyper-parameter value selection techniques include a Bayesian optimization such as a Gaussian process for hyper-parameter value selection, a random search, a gradient-based search, a grid search, hand-tuning techniques, a tree-structured Parzen Estimators (TPE) based technique.”), ¶ 33 (“With distinct sets of hyper-parameters values selected based on one or more of these techniques, each machine learning algorithm variant is trained on a training data set. A test data set is used as an input to the trained model for calculating the predicted result values. The predicted result values are compared with the corresponding label values to determine the performance score. The performance score may be computed based on calculating the error rate of predicted results in relation to the corresponding labels. For example, in a categorical domain if out of 10,000 inputs to the model only 9,000 matched the labels for the inputs, then the performance score is computed to be 90%. In non-categorical domains, the performance score may be further based on a statistical aggregation on the difference between the label value and the predicted result value.”)).
Regarding claim6, Agrawal, in view of Anderson, discloses the invention of claim 3 as discussed above. Agrawal further discloses wherein searching for one or more candidate machine learning algorithms through the machine learning algorithm search space comprises performing a random search process (¶ 32 (“Therefore, to select a type of algorithm or moreover, to select the best performing variant of an algorithm, various hyper-parameter selection techniques are used to generate distinct sets of hyper-parameter values. Non-limiting examples of hyper-parameter value selection techniques include a Bayesian optimization such as a Gaussian process for hyper-parameter value selection, a random search, a gradient-based search, a grid search, hand-tuning techniques, a tree-structured Parzen Estimators (TPE) based technique.”)).
Regarding claim 7, Agrawal, in view of Anderson, discloses the invention of claim 3 as discussed above. Agrawal further discloses wherein searching for one or more candidate machine learning algorithms through the machine learning algorithm search space comprises searching using reinforcement learning or Bayesian optimization (¶ 32 (“Therefore, to select a type of algorithm or moreover, to select the best performing variant of an algorithm, various hyper-parameter selection techniques are used to generate distinct sets of hyper-parameter values. Non-limiting examples of hyper-parameter value selection techniques include a Bayesian optimization such as a Gaussian process for hyper-parameter value selection, a random search, a gradient-based search, a grid search, hand-tuning techniques, a tree-structured Parzen Estimators (TPE) based technique.”)).
Regarding claim 17, Agrawal, in view of Anderson, discloses the invention of claim 1 as discussed above. Agrawal further discloses wherein the output machine learning algorithm defines (i) a model architecture for a machine learning model for performing the particular machine learning task, (Agrawal ¶ 27 (“A machine learning algorithm may be selected based on the domain of the problem and the intended type of outcome required by the problem.”)) (ii) hyperparameters for training of the machine learning model to perform the particular machine learning task, (¶ 41 (“A reference variant is selected as a baseline variant, of the machine learning algorithm, which is modified to generate a mini-ML variant. Through iterative evaluations of variants of the reference variant, a mini-ML variant is generated, and in each iteration, at least one hyper-parameter value of the reference variant is modified.”)) and (ii) preprocessing techniques applied to inputs during, after, or both the training (¶ 34 (“In an embodiment, cross-validation techniques, such as k-fold cross-validation, is used to create many pairs of training and test datasets from an original training data set.”)).
Claims 8-14 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal and Anderson as applied to claim 3 above, and further in view of Wistuba, Martin, Ambrish Rawat, and Tejaswini Pedapati. "A survey on neural architecture search." arXiv preprint arXiv:1905.01392 (2019) (“Wistuba”).
Regarding claim 8, Agrawal, in view of Anderson, discloses the invention of claim 3 as discussed above. Agrawal does not expressly disclose wherein searching for one or more candidate machine learning algorithms through the machine learning algorithm search space comprises performing a regularized evolution search process, comprising: (but see Wistuba pg. 17 (“In the context of neural architecture search, the population consists of a pool of network architectures. A parent architecture or a pair of architectures is selected in step 1 for mutation or recombination, respectively. The steps of mutation and recombination refer to operations that lead to novel architectures in the search space which are evaluated for fitness in step 3 and the process is repeated till termination. Often only mutation operators are used in the domain of neural architecture search.”))
selecting a parent candidate machine learning algorithm from the current sequence of candidate machine learning algorithms, the parent candidate machine learning algorithm comprising a parent setup function, a parent predict function, and a parent learn function, and (but see Wistuba pg. 20 (“The follow-up work by Real et al. (2019) is one of the most significant works in the direction of using evolutionary algorithms for architecture search. It is primarily known for finding the architectures AmoebaNet-B and AmoebaNet-C which set new records for the task of image classification on CIFAR-10 and ImageNet dataset. However, their search process used a total of 3,150 GPU hours. Their evolutionary search algorithm operates on the NASNet search space (details in Section 2.2) with tournament selection used to select parents at each iteration.”))
modifying the parent candidate machine learning algorithm to generate one or more child candidate machine learning algorithms, each child candidate machine learning algorithm comprising a child setup function, a child predict function, and a child learn function (but see Wistuba pg. 20 (“The selected individuals are mutated with a random change in an operation or a connection in either a normal or a reduction cell and are trained for 25 epochs.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agrawal to incorporate the teachings of Wistuba to apply an evolutionary algorithm in the context of neural architecture search, at least because doing so trades off diversity and similarity in the population of algorithms akin to the exploration and exploitation trade-off in reinforcement learning-based search algorithms. See Wistuba pg. 16.
Regarding claim 9, Agrawal, in view of Anderson and Wistuba, discloses the invention of claim 8 as discussed above. Agrawal does not expressly disclose wherein selecting the parent candidate machine learning algorithm from the current sequence of candidate machine learning algorithms comprises:
randomly choosing a plurality of candidate machine learning algorithms from the current sequence of candidate machine learning algorithms, and (but see Wistuba pg. 17 (“The most common parent selection method in neural architecture search is tournament selection (Goldberg and Deb, 1990). This method selects the individuals for further evolution in two steps. First, it selects k individuals from the population at random.”))
selecting a chosen candidate machine learning algorithm with the highest performance metric among the plurality of candidate machine learning algorithms as the parent candidate machine learning algorithm (but see Wistuba pg. 17 (“Then it iterates over them in the descending order of their fitness while selecting individuals for further steps with some (prespecified) high probability p.”)).
Agrawal and Wistuba are combined according to the same rationale as set forth above.
Regarding claim 10, Agrawal, in view of Anderson and Wistuba, discloses the invention of claim 8 as discussed above. Agrawal does not expressly disclose wherein modifying the parent candidate machine learning algorithm to generate the one or more child candidate machine learning algorithms comprises:
performing the following steps for one or more times:
for at least one of the parent setup function, the parent predict function, or the parent learn function of the parent algorithm, performing at least one of (i) inserting a random instruction into a random position in the current function, (ii) removing an instruction at a random location in the current function, (iii) randomizing all the instructions in the current function, or (iv) modifying a random argument of another random instruction in the current function, and (but see Wistuba pg. 18 (“The set of mutations consists of simple operations such as adding convolutions (possibly with batch normalization nor ReLU activation) at arbitrary locations before the global pooling layer, along with mutations that alter kernel size, number of channels, stride and learning rate, add or remove skip connections, remove convolutions, reset weights, and a mutation that corresponds to no change.”))
creating a child candidate machine learning algorithm based on the at least one of the modified parent setup function, the modified parent predict function, or the modified parent learn function (but see Wistuba pg. 18 (“The set of mutations consists of simple operations such as adding convolutions (possibly with batch normalization nor ReLU activation) at arbitrary locations before the global pooling layer, along with mutations that alter kernel size, number of channels, stride and learning rate, add or remove skip connections, remove convolutions, reset weights, and a mutation that corresponds to no change.”)).
Agrawal and Wistuba are combined according to the same rationale as set forth above.
Regarding claim 11, Agrawal, in view of Anderson and Wistuba, discloses the invention of claim 10 as discussed above. Agrawal does not expressly disclose adding the one or more child candidate machine learning algorithms to the current sequence of candidate machine learning algorithms (but see Wistuba figure 11 (survivor selection of offspring is used for parent selection)).
Agrawal and Wistuba are combined according to the same rationale as set forth above.
Regarding claim 12, Agrawal, in view of Anderson and Wistuba, discloses the invention of claim 11 as discussed above. Agrawal does not expressly disclose removing a removal candidate machine learning algorithm from the current sequence of candidate machine learning algorithms (but see Wistuba pg. 18 (“While the better of the two architectures is copied (along with the weights), mutated, trained for 5,600 steps with a batch size 50, and added to the population, the other one is removed from the population.”)).
Agrawal and Wistuba are combined according to the same rationale as set forth above.
Regarding claim 13, Agrawal, in view of Anderson and Wistuba, discloses the invention of claim 12 as discussed above. Agrawal does not expressly disclose wherein the removal candidate machine learning algorithm is the oldest candidate machine learning algorithm in the current sequence of candidate machine learning algorithms (but see Wistuba pg. 20 (“The follow-up work by Real et al. (2019) is one of the most significant works in the direction of using evolutionary algorithms for architecture search. It is primarily known for finding the architectures AmoebaNet-B and AmoebaNet-C which set new records for the task of image classification on CIFAR-10 and ImageNet dataset. However, their search process used a total of 3,150 GPU hours. Their evolutionary search algorithm operates on the NASNet search space (details in Section 2.2) with tournament selection used to select parents at each iteration. The selected individuals are mutated with a random change in an operation or a connection in either a normal or a reduction cell and are trained for 25 epochs. As opposed to previous works, their algorithm does not solely rely on validation accuracy and incorporates age in the selection of survivors.”); see also Table 1 (survivor selection is of youngest offspring for Real et al. (2019) evolutionary algorithm)).
Agrawal and Wistuba are combined according to the same rationale as set forth above.
Regarding claim 14, Agrawal, in view of Anderson and Wistuba, discloses the invention of claim 7 as discussed above. Agrawal does not expressly disclose wherein the regularized evolution search process is executed in parallel (but see Wistuba pg. 18 (“In contrast to Real et al. (2017), their work develops a genetic algorithm over a more structured search space which comprises of a sequence of segments with possibly parallel convolution operations.”)).
Agrawal and Wistuba are combined according to the same rationale as set forth above.
Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal and Anderson as applied to claim 1 above, and further in view of Wistuba and Shahrzad (US 2020/0272908 A1; published Aug. 27, 2020).
Regarding claim 15, Agrawal, in view of Anderson, discloses the invention of claim 1 as discussed above. Agrawal does not expressly disclose wherein adding a child candidate machine learning algorithm from the one or more child candidate machine learning algorithms into the current sequence of candidate machine learning algorithms further comprises: (but see Wistuba pg. 17 (“In the context of neural architecture search, the population consists of a pool of network architectures. A parent architecture or a pair of architectures is selected in step 1 for mutation or recombination, respectively. The steps of mutation and recombination refer to operations that lead to novel architectures in the search space which are evaluated for fitness in step 3 and the process is repeated till termination. Often only mutation operators are used in the domain of neural architecture search. There is no indication that a recombination operation applied to two individuals with high fitness would result into an offspring with similar or better fitness. On the contrary, oftentimes the fitness of the resultant offspring is much lower.”)).
Agrawal and Wistuba are combined according to the same rationale as set forth above.
Agrawal does not expressly disclose performing a novelty check that confirms whether the child candidate machine learning algorithm has a different behavior compared to the current sequence of candidate machine learning algorithms, (but see Shahrzad ¶ 101 (“FIG. 13b illustrates a method by which the competition module 414 selects a final set of individuals from the first set of individuals in dependence upon generation (G) number. That is, after selecting a first set of individuals using a dominance filtering process 625, if G=P, wherein P is a predetermined number 626, then a process for determining relative novelty and/or relative diversity among the individuals in the first set is applied. First, at step 629, the process expands the number of individuals in the first set of individuals by a multiplier n2>1 (n2=2 in the experiments herein). Next, at step 630, the competition module 414 estimates the novelty score for each individual in the first set of individuals. In an embodiment, when individuals from the candidate individual pool are tested against a portion of the training data, a behavioral value of the individual is identified. As used herein, a behavioral value b(x) of an individual x in a data mining environment is a vector or a scalar number resulting from the evaluation of the individual in the data mining environment, according to a predefined measure of its behavior. For example, for an n-line sorting network, the predetermined measure of behavior may be a vector of size n, when each element ‘i’ is the number of times the i-th line of the sorting network would switch, when applying the network to all 2{circumflex over ( )}n possible combinations. For stock market domain, this measure could be the pairwise correlation of the two individual signals. In other words, the predetermined measure of the behavior of an individual captures a space that is expected to have practical benefits. As used herein, a behavior difference d(b(x),b(y)) between an individual x and an individual y is the distance between two individuals in the behavior space. The novelty score of an individual xi can be estimated by summing the pairwise distance of its behavior vector to that of all the other individuals in the first set of individuals.”))
in response to a determination that the novelty check is yes, adding the child candidate machine learning algorithm into the current sequence of candidate machine learning algorithms, or (but see Shahrzad ¶ 102 (“At step 632, the competition module 414 selects a predetermined number of individuals with greater novelty from the first set of individuals to form the second set of individuals. In some embodiments, individuals with the highest novelty score in the first set of individuals are selected to form the second set of individuals.”))
in response to a determination that the novelty check is no, skipping evaluation of the child candidate machine learning algorithm (but see Shahrzad ¶ 102 (“At step 632, the competition module 414 selects a predetermined number of individuals with greater novelty from the first set of individuals to form the second set of individuals. In some embodiments, individuals with the highest novelty score in the first set of individuals are selected to form the second set of individuals.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agrawal to incorporate the teachings of Shahrzad to apply a novelty test to the offspring/child neural architectures obtained through mutation and recombination, at least because doing so would produce a diverse group of neural architectures that escapes deception and finds novel solutions. See Shahrzad ¶ 13.
Regarding claim 16, Agrawal, in view of Anderson and Shahrzad, discloses the invention of claim 15 as discussed above. Agrawal does not expressly disclose wherein the novelty check is performed by using a functional equivalence checking technique (but see Shahrzad ¶ 101 (“FIG. 13b illustrates a method by which the competition module 414 selects a final set of individuals from the first set of individuals in dependence upon generation (G) number. That is, after selecting a first set of individuals using a dominance filtering process 625, if G=P, wherein P is a predetermined number 626, then a process for determining relative novelty and/or relative diversity among the individuals in the first set is applied. First, at step 629, the process expands the number of individuals in the first set of individuals by a multiplier n2>1 (n2=2 in the experiments herein). Next, at step 630, the competition module 414 estimates the novelty score for each individual in the first set of individuals. In an embodiment, when individuals from the candidate individual pool are tested against a portion of the training data, a behavioral value of the individual is identified. As used herein, a behavioral value b(x) of an individual x in a data mining environment is a vector or a scalar number resulting from the evaluation of the individual in the data mining environment, according to a predefined measure of its behavior. For example, for an n-line sorting network, the predetermined measure of behavior may be a vector of size n, when each element ‘i’ is the number of times the i-th line of the sorting network would switch, when applying the network to all 2{circumflex over ( )}n possible combinations. For stock market domain, this measure could be the pairwise correlation of the two individual signals. In other words, the predetermined measure of the behavior of an individual captures a space that is expected to have practical benefits. As used herein, a behavior difference d(b(x),b(y)) between an individual x and an individual y is the distance between two individuals in the behavior space. The novelty score of an individual xi can be estimated by summing the pairwise distance of its behavior vector to that of all the other individuals in the first set of individuals.”) (novelty is tested based on behavior which is interpreted as the claimed functionality)).
Agrawal and Shahrzad are combined according to the same rationale as set forth above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Tian, Haiman, Shu-Ching Chen, and Mei-Ling Shyu. "Genetic algorithm based deep learning model selection for visual data classification." 2019 IEEE 20th international conference on information reuse and integration for data science (IRI). IEEE, 2019.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571)272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAHID K KHAN/Primary Examiner, Art Unit 2146