Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This Final Rejection is in response to Applicant Arguments/Remarks made in an amendment filed c.
Claims 26, 35, 38, and 45 are amended.
The U.S.C. 112(b) rejection to claim 35 is respectfully withdrawn.
The U.S.C. 101 rejections to claims 26-45 are respectfully withdrawn.
Claims 26-45 remain pending.
Response to Arguments
Argument 1, applicant argues in Applicant Arguments/Remarks made in an amendment filed 02/02/2026, pg. 16-21 that prior art fails to teach the primary claim limitations, “to generate all possible combinations of activation functions for the ANN Architecture based on the received indications of customization; forward propagate, for each generated combination, input data from a dataset through the activation functions of each layer of the ANN architecture to generate intermediate values;- store the intermediate values in a memory during the forward propagation for each generated combination; evaluate each generated combination with an objective metric based on the stored intermediate values to determine an activation level score for each generated combination; compare the activation level scores across the generated combinations; and store at least one combination and associated intermediate values as model data in a dataset based on the comparison of activation level score; wherein the output of the evaluation defines a measure of a benchmark of the ANN architecture”.
Response to Argument 1, in light of the amendments a newly found combination of references is applied to updated rejections (U.S. Patent Application Publication NO. 20170193367 “Miikkulainen”, in light of Krish Naik. (2019, July 11). How to choose number of hidden layers and nodes in Neural Network. YouTube. https://www.youtube.com/watch?v=Bc2dWI3vnE0, hereinafter “Naik”, and further in light of Franco Manessi, & Alessandro Rozza. (2018). Learning Combinations of Activation Functions. 2022 26th International Conference on Pattern Recognition (ICPR). https://doi.org/10.1109/icpr.2018.8545362, hereinafter “Manessi”).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 26, 30-38, & 42-45 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication NO. 20170193367 “Miikkulainen”, in light of Krish Naik. (2019, July 11). How to choose number of hidden layers and nodes in Neural Network. YouTube. https://www.youtube.com/watch?v=Bc2dWI3vnE0, hereinafter “Naik”, and further in light of Franco Manessi, & Alessandro Rozza. (2018). Learning Combinations of Activation Functions. 2022 26th International Conference on Pattern Recognition (ICPR). https://doi.org/10.1109/icpr.2018.8545362, hereinafter “Manessi”)”.
Claim 26:
Miikkulainen a system arranged to customize an Artificial Neural Network (ANN) comprising: - an ANN architecture (i.e. para. [0094] “The technology disclosed provides a so-called machine-learned conversion optimization (MLCO) system that uses artificial neural networks and evolutionary computations to efficiently identify most successful webpage designs), comprising: - an input layer comprising at least one node (i.e. para. [0074], Fig. 2, “The hyperparameters of the input layer are based on user attribute data 114 and can be defined based on specifications provided by a designer”, wherein it is noted in Fig. 2 that the input layer comprises at least one node), at least one hidden layer comprising at least one node (i.e. para. [0045], Each individual does specify values for other hyperparameters of the neural network, such as the number of hidden layers of the network, the number of neurons in each hidden layer, and their interconnection weights), and an output layer comprising at least one node (i.e. para. [0077], Fig. 2, “The hyperparameters of the output layer are initialized in dependence upon a starter funnel defined by the designer”, wherein it is noted in Fig. 2 that the output layer comprises at least one node), wherein the input layer and the at least one hidden layer are connected by edges, and the at least one hidden layer and the output layer are connected by edges (i.e. para. [0045], “Each individual does specify values for other hyperparameters of the neural network, such as the number of hidden layers of the network, the number of neurons in each hidden layer, and their interconnection weights”, wherein the input, hidden layer, and output layer are interconnected), wherein each node comprises an activation function (i.e. para. [0074], “The input layer takes user attribute data 114 as input, the hidden layer uses non-linearity functions and network weights to generate alternative representations of the input, and the output layer generates dimension values for an output funnel based on the alternative representations… Some examples of the non-linearity functions include sigmoid function, rectified linear units (ReLUs), hyperbolic tangent function, absolute of hyperbolic tangent function, leaky ReLUs (LReLUs), and parametrized ReLUs (PReLUs)”, wherein the BRI for each node comprises an activation function encompasses how a non-linearity mathematical functions is applied within each neuron); - a graphical user interface arranged to receive user input (i.e. para. [0081], a web interface layout” is merely a template within which the alternative values for dimensions are inserted in order to define a particular web interface of a funnel. In one implementation, the web interface layout is displayed across a simulated device selected by the designer), the user input comprising: (i.e. para. [0120], processor 1314) configured to:
-forward propagate, for each generated combination, input data from a dataset through
-wherein the output of the evaluation defines a measure of a benchmark of the ANN architecture (i.e. para. [0057], Fitness aggregation module 118 aggregates the performance measures of the current candidate individual over all of the user sessions for which the neural network of the current candidate individual was used. Aggregation may be an average, or may be some other formula for developing a combined fitness value for the individual. The aggregate performance measure is written into the candidate individual population pool 106 in association with the current candidate individual).
While Miikkulainen teaches an interface where a designer can customize or specify certain attributes of an input layer of a neural network, which is then tested for its success in achieving the target user behavior, Miikkulainen may not explicitly the user input comprising
- indication on how to customize the number of hidden layers; - indication on how to customize the number of nodes for each of the hidden layers; - indication on how to customize at least one activation function for one or more of the nodes; and - provide the ANN based on the received indications of customization;
However, Naik teaches a graphical user interface arranged to receive user input (i.e. (7:18), the examiner notes that the Jupyter interface is arranged to receive user input customizing a neural network), the user input comprising
- indication on how to customize the number of hidden layers (i.e. (9:15), the examiner notes that an indication at In [17] depicts how to customize a number of hidden layer, wherein a user may specify 1, 2, or 3 hidden layers); - indication on how to customize the number of nodes for each of the hidden layers (i.e. (9:15), the examiner notes that an indication at In [17] depicts how to customize the number of nodes for each of the hidden layers. First a scenario where there is 1 hidden layer of neural nodes numbering as 20, second a scenario where there are 2 hidden layers of neural nodes as numbering 40 and 20 respectively, and third a scenario where there are 3 hidden layers of neural nodes as numbering 45, 30, & 15 respectively); - indication on how to customize at least one activation function for one or more of the nodes .e. (9:15), the examiner notes that an indication at In [17] depicts how to customize at least one of two activation functions, such as ‘sigmoid’ or ‘relu’); and - generate (i.e. (12:35), the examiner notes that the User has found a best score and best parameter, wherein the best activation function would be ‘relu” and the best number of hidden layers would be three layers of neural nodes numbering as 45, 30, and 15 respectively, which may be provided to the Jupyter program for simulation and evaluation for an accuracy core of 79.75% at (13:40));
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add wherein user input comprising an indication on how to customize the number of hidden layers; indication on how to customize the number of nodes for each of the hidden layers; indication on how to customize at least one activation function for one or more of the nodes; and provide the ANN based on the received indications of customization, to Miikkulainen’s display and hardware for customizing and testing a neural network, with the GUI and input interface of Naik which shows how a user may use the Jupyter program to customize a neural networks hidden layer and subsequently evaluate a performance metric, as taught by Naik. One would have been motivated to combine Naik with Miikkulainen and would have had a reasonable expectation of success as the combination provides users with more granularity to achieve the right balance between model capacity and generalization.
While Miikkulainen and Naik teach the ANN and user interface for evaluating and selecting a possible activation function for an ANN, Miikkulainen and Naik may not explicitly teach
- generate all possible combinations of activation functions for the ANN Architecture based on the received indications of customization;
-forward propagate, for each generated combination, input data from a dataset through the activation functions of each layer of the ANN architecture to generate intermediate values;- store the intermediate values in a memory during the forward propagation for each generated combination;
- evaluate each generated combination with an objective metric based on the stored intermediate values to determine an activation level score for each generated combination;
- compare the activation level scores across the generated combinations; and
-store at least one combination and associated intermediate values as model data in a dataset based on the comparison of activation level scores
-wherein the output of the evaluation defines a measure of a benchmark of the ANN architecture
However, Manessi teaches
(i.e. pg. 2, “In this paper, we introduce two techniques to define learn able activation functions that could be plugged in all hidden layers of a neural network architecture. The two approaches differ in how they define the hypothesis space Hσi . Both of them are based on the following idea: (i) select a finite set of activation functions F := {f1,...,fN}, whose elements will be used as base elements; (ii) define the learnable activation function σi as a linear combination of the elements of F; (iii) identify a suitable hypothesis space Hσi ; (iv) optimize the whole network, where the hypothesis space of each hidden layer is Hi = Hσi ×Hgi .”, wherein the BRI to generate all for all possible combinations of activation functions encompasses generating all possible learning activation functions that may be plugged in to a neural network architecture depending on the hidden layers. Wherein the BRI for customization encompasses how the techniques were practiced on different selections of architecture including LeNet-5, KerasNet, ResNet-56, and AlexNet);
-forward propagate, for each generated combination, input data from a dataset through the activation functions of each layer of the ANN architecture to generate intermediate values (i.e. pg. 4, “These networks were trained and tested using as activation functions (for all their hidden layers) those learned by the convex hull-based and the affine hull-based approaches combining the base activations reported in Equation (1))”, wherein training and testing to generate predictions for the architectures would involve forward propagation and back propagation);
- store the intermediate values in a memory during the forward propagation for each generated combination (i.e. pg. 6, Table II, “The Table show the Top-1 Accuracy results for all the analyzed networks);
- evaluate each generated combination with an objective metric based on the stored intermediate values to determine an activation level score for each generated combination (i.e. pg. 4-5, “In addition, the base activation functions alone and LReLU were also employed in order to compare the overall performance… Table II shows the top-1 accuracy for all the run experiments. The best configurations (shaded cells in the table) are always achieved using our techniques”, wherein the BRI for an activation level score encompasses a accuracy metric) ;
- compare the activation level scores across the generated combinations (i.e. pg. 5, The uplift in top-1 accuracy using our approaches compared to customary activations goes from 0.69 percentage points (pp) for LeNet-5 on Fashion-MNIST up to 4.59 pp for KerasNet on CIFAR-10); and
-store at least one combination and associated intermediate values as model data in a dataset based on the comparison of activation level scores (i.e. pg. 5, “the proposed techniques usually achieve better results than their corresponding base activation functions (boldface in the table).”, wherein the BRI to store a combination encompasses how the best functions are stored in table II)
-wherein the output of the evaluation defines a measure of a benchmark of the ANN (i.e. pg. 5, the techniques proposed in this paper achieved the best performance and the combined activation functions learned using our approaches usually outperform the corresponding base components.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add to generate all possible combinations of activation functions for the ANN Architecture based on the received indications of customization; forward propagate, for each generated combination, input data from a dataset through the activation functions of each layer of the ANN architecture to generate intermediate values;- store the intermediate values in a memory during the forward propagation for each generated combination; evaluate each generated combination with an objective metric based on the stored intermediate values to determine an activation level score for each generated combination; compare the activation level scores across the generated combinations; and store at least one combination and associated intermediate values as model data in a dataset based on the comparison of activation level score; wherein the output of the evaluation defines a measure of a benchmark of the ANN architecture, to Miikkulainen-Naik’s GUI and hardware for customizing and testing a neural network, with methods to test and identify a best activation function for a given neural network architecture, as taught by Manessi. One would have been motivated to combine Manessi with Miikkulainen-Naik and would have had a reasonable expectation of success as the combination provides an increase in performance achieved using networks with different depths and architectures.
Claim 30:
Miikkulainen, Naik, and Manessi teach a system of claim 26.
Naik further teaches wherein the processor is further configured to generate at least one activation function based on the dataset (i.e. the examiner notes at (12:35), that the processor of the user’s computer using the Jupyter program has generating at Out[20] a recommended best activation function of ‘relu’)
Claim 31:
Miikkulainen, Naik, and Manessi teach a system of claim 26.
Naik further teaches wherein the user input further comprises an indication of an estimation of an activation function, and the processor is further configured to generate the at least one activation function based on a transformation of the estimation (i.e. the examiner notes at (12:35), that the BRI for an indication of an estimation of an activation function encompasses how the an indication of a best estimated activation function is generated and displayed to be a ‘relu’ function based on a transformation of the previously input data).
Claim 32:
Miikkulainen, Naik, and Manessi teach a system of claim 26.
Naik further teaches wherein the at least one activation function is a gaussian function, sigmoidal function, reLu function, tanh function, a rectified linear function, or a swish function (i.e. the examiner notes at (12:35), that a best estimated activation function is generated and displayed to be a ‘relu’ function).
Claim 33:
Miikkulainen, Naik, and Manessi teach the system a system of claim 26.
Miikkulainen further teaches wherein the user input further comprises an indication on how to customize a weight of at least one of the edges (i.e. para. [0045] “Each individual does specify values for other hyperparameters of the neural network, such as the number of hidden layers of the network, the number of neurons in each hidden layer, and their interconnection weights”, wherein the BRI for an indication on how to customize a weight encompasses how the presentation of an option to specify interconnection weights is an indication on how to customize an edge weight).
Claim 34:
Miikkulainen, Naik, and Manessi teach a system of claim 26.
Miikkulainen further teaches wherein the processor is further configured to pre-process the dataset prior to simulating the ANN (i.e. para. [0051], “a population initialization module generates a preliminary pool of individuals and writes them into the candidate individual population 106. Each individual identifies a respective set of values for the hyperparameters of the individual.”, wherein the BRI to pre-process encompasses refining preliminary data to identify a set of values for the hyperparameters, which happens before a test for fitness of the model)
Claim 35:
Miikkulainen, Naik, and Manessi teach a system of claim 26
Naik further teaches
wherein the graphical user interface is arranged to receive additional user input of a pre-trained ANN , the additional user input comprising the data of the architecture of the ANN, comprising: - at least one activation function, the number of hidden layers, the number of nodes for each of the hidden layers; and/or the weight of all nodes (i.e. the examiner notes at (1:19) that the Jupyter interface is configured to receive additional user input of pre-trained models, such as from a TensorFLow and Keras libraries, as well as activation functions).
Claim 36:
Miikkulainen, Naik, and Manessi teach a system of claim 26.
Miikkulainen teaches further comprising a database configured to store at least one of: - the dataset, the number of layers, the number of nodes, the weight of the edges, the at least one activation function, the ANN, at least one layer of the ANN, the output of the run (i.e. para. [0079], The topological sequence is set graphically by the designer across the interface 304 and stored logically in memory).
Claim 37:
Miikkulainen, Naik, and Manessi teach a system of claim 26.
Miikkulainen further teaches wherein the dataset corresponds to the measurement of at least one biological interaction (i.e. para. [0074], “the neurons of the input layer correspond to user attributes pertaining to which day of the week user activity is detected, the operating system (O/S) of user's device, the type of user's device, and the ad group through which the user was directed”, wherein the BRI for a measurement of at least one biological interaction encompasses how a user is biological and the dataset input in to the input layer corresponds to a user interaction activity occurring on a certain day of the week or which ad group a user may have been directed to interact with).
Claim 38:
Claim 38 is the method claim reciting similar limitations to Claim 26 and is rejected for similar reasons.
Claim 42:
Claim 42 is the method claim reciting similar limitations to Claim 30 and is rejected for similar reasons.
Claim 43:
Claim 43 is the method claim reciting similar limitations to Claim 31 and is rejected for similar reasons.
Claim 44:
Claim 44 is the method claim reciting similar limitations to Claim 32 and is rejected for similar reasons.
Claim 45:
Miikkulainen, Miikkulainen further teaches a non-transitory computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the method of claim 38 (i.e. para. [0123], Storage subsystem 1324 stores programming and data constructs that provide the functionality of some or all of the modules and methods described herein. These software modules are generally executed by processor 1314 alone or in combination with other processors).
Claim(s) 27-30 & 39-41 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication NO. 20170193367 “Miikkulainen”, in light of Naik, and further in light Manessi, as applied to claims 26 and 38 above, and further in view of U.S. Patent Application Publication NO. 20220027713 “Huang”.
Claim 27:
Miikkulainen, Naik, and Manessi teach the system of Claim 26.
Miikkulainen, Naik, and Manessi may not explicitly teach
wherein the activation function of the input layer differs from the activation function of the at least one hidden layer, and/or the activation function of the at least one hidden layer differs from the activation function of the output layer, and/or the activation function of the input layer differs from the activation function of the output layer.
However, Huang teaches
wherein the activation function of the input layer differs from the activation function of the at least one hidden layer, and/or the activation function of the at least one hidden layer differs from the activation function of the output layer, and/or the activation function of the input layer differs from the activation function of the output layer (i.e. para. [0021-0022], The multi-function calculator 110 is configured to provide the same or different activation functions for each node 121 in the hidden layer. … The same or different activation functions A may be used between the nodes 121 on the same layer. In the case where the nodes 121 of the same layer all use the same activation function A, the same or different activation functions A may be used between all layers.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add wherein the activation function of the input layer differs from the activation function of the at least one hidden layer, and/or the activation function of the at least one hidden layer differs from the activation function of the output layer, and/or the activation function of the input layer differs from the activation function of the output layer, to Miikkulainen-Naik-Manessi’s customizing and testing of a neural network, with wherein the activation function of the input layer differs from the activation function of the at least one hidden layer, and/or the activation function of the at least one hidden layer differs from the activation function of the output layer, and/or the activation function of the input layer differs from the activation function of the output layer, as taught by Huang. One would have been motivated to combine Huang with Miikkulainen-Naik-Manessi and would have had a reasonable expectation of success as the combination gives users the granularity to optimize their neural network according to their specific problem.
Claim 28:
Miikkulainen, Naik, and Manessi teach the system of Claim 26.
Miikkulainen, Naik, and Manessi may not explicitly teach
wherein the activation function of the input layer, the activation function of the at least one hidden layer, and the activation function of the output layer are all different.
However Huang teaches
wherein the activation function of the input layer, the activation function of the at least one hidden layer, and the activation function of the output layer are all different (i.e. para. [0022], The same or different activation functions A may be used between the nodes 121 on the same layer. In the case where the nodes 121 of the same layer all use the same activation function A, the same or different activation functions A may be used between all layers).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add wherein the activation function of the input layer, the activation function of the at least one hidden layer, and the activation function of the output layer are all different, to Miikkulainen-Naik-Manessi’s customizing and testing of a neural network, with wherein the activation function of the input layer, the activation function of the at least one hidden layer, and the activation function of the output layer are all different, as taught by Huang. One would have been motivated to combine Huang with Miikkulainen-Naik-Manessi and would have had a reasonable expectation of success as the combination gives users the granularity to optimize their neural network according to their specific problem.
Claim 29:
Miikkulainen, Naik, and Manessi teach the system of any preceding claim.
Miikkulainen, Naik, and Manessi may not explicitly teach
wherein the activation function of a first node in the at least one hidden layer differs from the activation function of a second node in the at least one hidden layer.
However Huang teaches
wherein the activation function of a first node in the at least one hidden layer differs from the activation function of a second node in the at least one hidden layer (i.e. para. [0021], The multi-function calculator 110 is configured to provide the same or different activation functions for each node 121 in the hidden layer).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add wherein the activation function of a first node in the at least one hidden layer differs from the activation function of a second node in the at least one hidden layer, to Miikkulainen-Naik-Manessi’s customizing and testing of a neural network, with wherein the activation function of a first node in the at least one hidden layer differs from the activation function of a second node in the at least one hidden layer, as taught by Huang. One would have been motivated to combine Huang with Miikkulainen-Naik-Manessi and would have had a reasonable expectation of success as the combination gives users the granularity to optimize their neural network according to their specific problem.
Claim 39:
Claim 39 is the method claim reciting similar limitations to Claim 4 and is rejected for similar reasons.
Claim 40:
Claim 40 is the method claim reciting similar limitations to Claim 3 and is rejected for similar reasons.
Claim 41:
Claim 41 is the method claim reciting similar limitations to Claim 4 and is rejected for similar reasons.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent Application Publication NO. 20220109654 “Fink” teaches in para. [0025], Neural Architecture Search (NAS) to discover the best neural-network architecture(s); hyperparameter optimization to discover the best neural-network parameters for a network; loss function optimization to discover the best loss function(s) for a neural-network; activation function optimization to discover the best activation function(s) for a neural-network; and/or data augmentation optimization to discover the best data augmentation pipeline for a neural-network.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID H TAN whose telephone number is (571)272-7433. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571) 272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/D.T./Examiner, Art Unit 2145
/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2145