Office Action Analysis: 17940159 — MAPPING ACTIVATION FUNCTIONS TO DATA FOR DEEP LEARNING

Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

         DETAILED ACTION	
	Claims 1-20 are presented for examination in this application (17940159) filed
September 8, 2022. The Examiner cites particular sections in the references as applied to the claims below for the convenience of the applicant(s). Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant(s) fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
Response to Arguments
	Applicant’s arguments and remarks filed 2025-10-01 have been fully considered. The arguments and remarks regarding the 35 U.S.C 101 and 103 rejections were not found to be persuasive. The arguments and remarks made regarding the 35 U.S.C 102 rejections were found to be persuasive and have been withdrawn, however a new ground of rejection, necessitated by amendment, has been made for the claims that were rejected under 35 U.S.C 102.

35 U.S.C 101
Applicant asserts: 
	Applicant asserts “Under Step 2A, Prong1…the claims are directed to selecting specific activation functions based on properties of input data and a neural network that uses the activation function to control data that is fed forward through the neural network that uses the activation function to control data that is fed through the neural network. This cannot be practically performed in the human mind”. 
	“Under Step 2A, Prong 2 … the claims recite an improvement … in particular, a machine learning model is improved with a specifically selected activation function to control data that is fed forward through the neural network.”
	“Under Step 2B … the claims recite a combination of features that are an inventive concept that is significantly more than just a mental process. For instance, the claims recite an unconventional, adaptive mechanism for selecting activation functions based on specific properties of the data.”.










Examiner’s response: 
Examiner respectfully disagrees. The examiner notes that the process of selecting an activation function based on a determination of properties, is a mental process that can be practically performed within the human mind. A person having ordinary skill in the art could find that the properties of data such as skewness, kurtosis, and range-boundedness could be associated with specific activation functions and once an identification of the property has been made, a person having ordinary skill in the art can make the determination of which specific activation function to use for that input. The applicant’s response of 2A prong 2 and step 2B pertains to the elements of selecting activation functions based on properties of the input data which have been deemed abstract ideas. The examiner notes that the inventive concept cannot come from the abstract idea itself as noted in MPEP 2106.05 I. “An inventive concept "cannot be furnished by the unpatentable law of nature (or natural phenomenon or abstract idea) itself." Genetic Techs. Ltd. v. Merial LLC, 818 F.3d 1369, 1376, 118 USPQ2d 1541, 1546 (Fed. Cir. 2016).”.
35 U.S.C 103
Applicant’s response: 
	Applicant asserts that Teder does not disclose identifying one or more properties of data and selecting an activation function for a neural network based on the one or more properties as claimed … Teder generally relates to analyzing images of eyes to diagnose neurological conditions. During the interview, the Examiner alleged that paragraph 0167 disclosed selecting a activation functions based on one or more properties of data as claimed. However, paragraph 0167 appears to merely relate to static selection of activation functions based on use cases.”
Examiner’s response: 
	The Examiner respectfully disagrees. The Examiner notes that the use case of input can be considered a one or more property under broadest reasonable interpretation of which can be influential to the selecting of specific activation functions. As noted in the instant case’s specification at para [0003] “The properties that may cause modeling error or otherwise should be accounted for in deep learning may include skewness, kurtosis, range boundedness, and/or other properties.”, the property can and may include the use case. In regard to the amended limitations of the independent claims 1, 9, and 15, the claims have necessitated a new reference and thus a new rejection rendering the prior rejections moot in light of this new reference being added. 
Information Disclosure Statement
	Acknowledgement is made of the information disclosure statement filed on 2025-07-11.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C 101 because the claimed invention is directed
to an abstract idea without significantly more. The analysis of the claims will follow the
2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50-57
(January 7, 2019) (“2019 PEG”).
Regarding claim 1 (currently amended):  
	Step 1 – Is the claim directed to a process, machine, manufacture, or composition of matter?
	Yes, the claim is directed to a system.
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites abstract ideas: 
identify one or more properties of historical data relating to the input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
select an activation function for a neural network based on the one or more properties, the activation function controlling data that is fed forward in the neural network — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
wherein the activation function is selected to ensure that: outputs of nodes in the neural network are fed forward in a way that preserves the one or more properties of the input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
and outputs of the nodes are constrained within an interval of the input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
generate a prediction for the input data based on the executed neural network with the activation function at the fully connected dense layer of the neural network — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
a system of identifying and using an activation function of a neural network based on input data, the system comprising a processor — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
execute the neural network with the activation function at a fully connected dense layer of the neural network — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties if the input data, and (ii) constrain the outputs of the nodes within an interval if the input data — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
the neural network being trained on the historical data — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
transmit for display data indicating the prediction — this limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception. Any additional elements that were determined to be insignificant extra-solution activity in step 2A prong 2 are further evaluated in step 2B on whether they are well-understood, routine, and conventional activities. The “transmit for display data indicating the prediction” limitation was found to be an insignificant extra solution activity in claim 1. This limitation is recited at a high level of generality and amount to transmitting data over a network, which are well-understood, routine, and conventional activities (see MPEP 2106.05(d) II.). (MPEP 2106.05(f)) cannot integrate the abstract ideas into a practical application.

Regarding claim 2: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim is dependent on claim 1 which recited abstract ideas. The claim recites additional abstract ideas: 
compare the one or more properties to a threshold value — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
select the activation function based on whether the one or more properties exceeds the threshold value — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein to select the activation the processor is programmed — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 3 (currently amended):
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim is dependent on claim 1 which recited abstract ideas. The claim recites additional abstract ideas: 
to identify the one or more properties, the processor is further programmed to identify skewness, kurtosis and/or range boundedness of the input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 4:
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim is dependent on claim 1 which recited abstract ideas. The claim recites additional abstract ideas: 
select a Rectified Linear Unit (ReLU) activation function when the one or more properties include a skew in the input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 5: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim is dependent on claim 1 which recited abstract ideas. The claim recites additional abstract ideas: 
wherein the processor is further programmed to: select a Sigmoid activation function when the one or more properties includes a range boundedness or quasi-range boundedness of the input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 6: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim is dependent on claim 1 which recited abstract ideas. The claim recites additional abstract ideas: 
wherein the processor is further programmed to: select a second activation function adjacent to be executed in a layer of the neural network adjacent to the selected activation at the fully connected layer — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 7:
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim is dependent on claim 1 which recited abstract ideas. The claim recites additional abstract ideas: 
wherein the selected activation function and the second activation function are different from one another — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 8 (currently amended):
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim is dependent on claim 1 which recited abstract ideas.
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein the input data comprises a time series of data values — this limitation is directed to the field of use (see MPEP 2106.05(h) VI.) as it merely relates the machine learning model’s training to power consumption.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer components to perform the abstract idea amounts to no more than field of use to apply the exception. Generally linking the use of a judicial exception to a particular technological environment or field of use cannot provide an inventive concept. Thus the claim is not patent eligible. 

Regarding claim 9 (currently amended): 
	Step 1 – Is the claim directed to a process, machine, manufacture, or composition of matter?
	Yes, the claim is directed to a method.
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites abstract ideas: 
identifying, by a processor, one or more properties of historical data relating to the input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
selecting, by the processor, a first activation function for a neural network based on the one or more properties, the first activation function controlling data that is fed forward in the neural network at a first layer of the neural network at which the first activation function executes — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
selecting, by the processor, a second activation function for the neural network based on the one or more properties, the second activation function controlling data that is fed forward in the neural network at a second layer of the neural network at which the second activation function executes — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
generating, by the processor, a prediction for the input data based on the executed neural network — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
executing, by the processor, the neural network with the first activation function at the first activation at the first layer and the second activation function at the second layer — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties if the input data, and (ii) constrain the outputs of the nodes within an interval if the input data — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
transmitting, by the processor, for display data indicating the prediction — this limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception. Any additional elements that were determined to be insignificant extra-solution activity in step 2A prong 2 are further evaluated in step 2B on whether they are well-understood, routine, and conventional activities. The “transmitting, by the processor, for display data indicating the prediction” limitation was found to be an insignificant extra solution activity in claim 9. This limitation is recited at a high level of generality and amount to transmitting data over a network, which are well-understood, routine, and conventional activities (see MPEP 2106.05(d) II.). (MPEP 2106.05(f)) cannot integrate the abstract ideas into a practical application.

Regarding claim 10: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 9 which recited abstract ideas.
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein the first layer and the second layer are adjacent to one another — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 11: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 9 which recited abstract ideas.
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein executing the neural network comprises: executing the neural network with the second activation function at a fully connected dense layer — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 12: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 9 which recited abstract ideas.
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein the second activation function comprises a Rectified Linear Unit (ReLU) activation function — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 13: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 9 which recited abstract ideas.
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein executing the neural network comprises: executing the neural network with the second activation function at a layer that is adjacent to the fully connected dense layer — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 14: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 9 which recited abstract ideas.
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein the first activation function comprises a sigmoid activation function and the second activation function comprises a Rectified Linear Unit (ReLU) activation function — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 15 (currently amended): 
Step 1 – Is the claim directed to a process, machine, manufacture, or composition of matter?
	Yes, the claim is directed to a manufacture.
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites abstract ideas:
identify one or more properties of historical data relating to input data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
select an activation function for a neural network based on the one or more properties, the activation function controlling data that is fed forward in the neural network — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
the learned data to be used in the neural network to make a prediction based on the stored data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties if the input data, and (ii) constrain the outputs of the nodes within an interval if the input data — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
A non-transitory storage medium storing instructions that, when executed by a processor, programs the processor to — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
train, based on the historical data, the neural network with the activation function at a fully connected dense layer of the neural network — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
store learned data, which was learned during training — this limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception. Any additional elements that were determined to be insignificant extra-solution activity in step 2A prong 2 are further evaluated in step 2B on whether they are well-understood, routine, and conventional activities. The “transmit for display data indicating the prediction” limitation was found to be an insignificant extra solution activity in claim 15. This limitation is recited at a high level of generality and amount to transmitting data over a network, which are well-understood, routine, and conventional activities (see MPEP 2106.05(d) II.). (MPEP 2106.05(f)) cannot integrate the abstract ideas into a practical application.

Regarding claim 16: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 15 which recited abstract ideas.
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application:
wherein the learned data comprises weights learned at each node of the neural network — the process of classifying and organizing data amounts to mere instructions to apply an exception, as the use of a computer or other machinery in its machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 17: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 15 which recited abstract ideas. The claim recites additional abstract ideas: 
select a Rectified Linear Unit (ReLU) activation function when the one or more properties include a skew in the historical data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 18: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 15 which recited abstract ideas. The claim recites additional abstract ideas:
select a Sigmoid activation function when the one or more properties includes range boundedness or quasi-range boundedness of the historical data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).

Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 19: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 15 which recited abstract ideas. The claim recites additional abstract ideas:
select a second activation function adjacent to be executed in a layer of the neural network adjacent to the selected activation function at the fully connected layer — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).

Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 20: 
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim is dependent on claim 15 which recited abstract ideas. The claim recites additional abstract ideas:
wherein the selected activation function and the second activation function are different from one another — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception.
	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 5, 15, 16, and 18 are rejected under 35 U.S.C 103 as being unpatentable under Teder et al. (US20220383502A1 hereinafter, Teder) in view of Jie et al. (“Regularized Flexible Activation Function Combinations for Deep Neural Networks” hereinafter, Jie) and further in view of Patwari et al. (US20230297824A1 hereinafter, Patwari ).
Regarding claim 1 (currently amended):
	Teder teaches a system of identifying and using an activation function of a neural network based on input data, the system comprising (see [0416]: “Another aspect of the present disclosure provides a computing system, comprising one or more processors and memory storing one or more programs to be executed by the one or more processor, the one or more programs comprising instructions for training a neural network to localize an upper eyelid in an image”. Also see [0167]: “Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail below.”).
	a processor programmed to (see [0416]: “Another aspect of the present disclosure provides a computing system, comprising one or more processors and memory storing one or more programs to be executed by the one or more processor”)
select an activation function for a neural network based on the one or more properties, the activation function controlling data that is fed forward in the neural network (see [0391]: “In some embodiments, each hidden neuron (e.g., in a respective hidden layer in an auxiliary neural network) is associated with an activation function that performs a function on the input data (e.g., a linear or non-linear function). ”. Also see [0237]: “Generally, training a classifier (e.g., a neural network) comprises updating the plurality of parameters (e.g., weights) for the respective classifier through backpropagation (e.g., gradient descent). First, a forward propagation is performed, in which input data (e.g., a corresponding image for each respective training object in a plurality of training objects in the training dataset) is accepted into the neural network, and an output is calculated based on the selected activation function and an initial set of parameters (e.g., weights and/or hyperparameters)”.),
execute the neural network with the activation function at a fully connected dense layer of the neural network (see [0165]: “a corresponding first hidden layer comprising a corresponding plurality of hidden neurons, where each hidden neuron in the corresponding plurality of hidden neurons (i) is fully connected to each input in the plurality of inputs, (ii) is associated with a first activation function type, and (iii) is associated with a corresponding parameter (e.g., weight) in a plurality of parameters for the untrained or partially trained neural network, and one or more corresponding neural network outputs, where each respective neural network output in the corresponding one or more neural network outputs (i) directly or indirectly receives, as input, an output of each hidden neuron in the corresponding plurality of hidden neurons, and (ii) is associated with a second activation function type. In some such embodiments, the untrained or partially trained neural network is a fully connected network.”.), 
	generate the prediction for the input data based on the executed neural network with the activation function at the fully connected dense layer of the neural network (see [0367]: “As described above, in some embodiments, the trained auxiliary neural network is trained by a procedure comprising inputting an untrained or partially trained auxiliary neural network with each respective pair of images in the plurality of pairs of images and using a difference between the similarity label for the respective pair of images and a similarity prediction generated by the untrained or partially trained auxiliary neural network to update all or a subset of the plurality of parameters for the untrained or partially trained auxiliary neural network.”.) and 
	transmit, for display, data indicating the prediction (see [0414]: “In some embodiments, the method further comprises displaying a graphical representation of an output of the auxiliary neural network for one or more learning instances and/or one or more training epochs during the training process. For instance, in some embodiments, the graphical representation comprises a display of the first image, the second image, the similarity label (e.g., similar/not similar), and the similarity prediction (e.g., similar/not similar).”.). 
	Teder does not explicitly teach to identify one or more properties of historical data relating to the input data, wherein the activation function is selected to ensure that: outputs of nodes in the neural network are fed forward in a way that preserves the one or more properties of the input data, and outputs of the nodes are constrained within an interval of the input data or 
	Jie, however, teaches in analogous to identify one or more properties of historical data relating to the input data (see pg. 5 section III. “Experiments”:  “The datasets being experimented on is a combination of daily stock returns of G7 countries, which is a multi-variate time series [29], [30]. The returns of each day can be considered as an input vector to the corresponding hidden layer, while the output is one-step ahead forecast given a sequence of historical data.”), 
	wherein the activation function is selected to ensure that: outputs of nodes in the neural network are fed forward in a way that preserves the one or more properties of the input data (see pg. 10 subsection B “Time Complexity”: “For activation functions, each of them will include 2 parameters, and will process one input for either forward or backward path. Therefore, the extra computational complexity will still be proportional to the number of activation parameters.”. Also see pg. 2 section II. “Methodology”: “Thus Eq. (2) defines a form of activation function as a linear combination of a set of basic parameterized non-linear activation functions fk(z, βk ) with the same input x to the neuron. Normally, we require 0 ≤ αi,k ≤ 1 for all k and i to ensure that the output is strictly bounded between 0 and 1.”.), 
and outputs of the nodes are constrained within an interval of the input data (see pg. 4 fig.1 showing output within the interval of input data specifical for sigmoid activation functions) and 
the neural network being trained on the historical data (see pg. 7 subsection B “Experiment with convolutional autoencoder”: “For each trial, we randomly sample 5,000 examples from the original training datasets of MNIST and FMNIST as the training data, another randomly sampled example from the remaining of the training set as the validation set, and use the original test sets as the test sets in our experiment.”). 
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder and Jie
before him or her, to modify the system of claim 1 to include attributes of having historical data and constraining the outputs within an interval of data in order to improve time series forecasting (see abstract: “It has been shown that LSTM models with proposed flexible activations P-Sig-Ramp provide significant improvements in time series forecasting,”) and to increase flexibility for bounded and unbounded domains (see pg.2 section I. Introduction: “The limitation of existing studies can be illustrated as follows. First, most of existing work focus on some specific forms of parameterized activation functions rather than a more general form, or consider each component of the combination as a fixed activation function. Second, there is a lack of study on flexible activations with bounded domain such as sigmoid and tanh. Third, existing works rarely discuss the regularization on activations parameters, which have different nature from normal model parameters. In this study, we consider the activation function as a combination of a set of trainable functions following the constraints of several principles. Based on these principles, we develop two flexible activation functions that can be implemented for bounded or unbounded domain. In addition, layer-wise regularization on activation parameters is introduced to reduce the variance caused by activation functions.”)
	Neither Teder nor Jie teach wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data, and (ii) constrain the outputs of the nodes within an interval of the input data.
	Patwari, however, teaches wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data (see para[0068]: “The second mode (e.g., mode 2) is a low latency mode. In mode 2, each processing circuit 104 computes the output of “x” for each set of coefficients for the selected non-linear activation function. Thus, each processing circuit 104 calculates an output value for a given input data item for each of the available ranges. Each processing circuit 104 is programmed with the same sets of coefficients—e.g., the set of coefficients for each range of the selected non-linear activation function.”. Also see para [0027]: “This disclosure relates to integrated circuits (ICs) and, more particularly, to a programmable non-linear (PNL) activation engine for neural network acceleration. In accordance with the inventive arrangements described within this disclosure, example circuit architectures for a PNL activation engine are disclosed. The example circuit architectures may be used in the context of neural networks to implement a plurality of different non-linear activation functions. That is, the example circuit architectures implement a PNL activation engine that may be used, at least in part, to implement the non-linear activation function(s) of one or more nodes of one or more layers of a neural network.”) and 
	wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data (see para [0068]: “ In mode 2, each processing circuit 104 computes the output of “x” for each set of coefficients for the selected non-linear activation function. Thus, each processing circuit 104 calculates an output value for a given input data item for each of the available ranges. Each processing circuit 104 is programmed with the same sets of coefficients—e.g., the set of coefficients for each range of the selected non-linear activation function. Concurrently with the processing described, each processing circuit 104 is capable of calculating the range of the value of the received data input item. Each processing circuit 104 may then select the result that was calculated using the set of coefficients corresponding to the determined range of the value of the input data item.”. Also see [0077]: “Using the example ranges of Example 1, were an input data item with a value of −3.2 to be received, the range index would be determined to be 1 such that the set of coefficients associated with range index 1 and the range −4 to 0 would be used for the polynomial approximation performed by the particular processing circuit 104 that received the value of −3.2. Given input data items with values of −3.2, 3, 5, and -7, the respective range indexes (and corresponding sets of coefficients) used to process the respective input data items would be 1, 2, 3, and 0, respectively.”.).
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, and Patwari before him or her, to modify the system of claim 1 to include attributes of wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data in order to compute processes with a lower latency (see Patwari para [0068]: “Mode 2 has slightly lower latency than mode 1 in that each processing circuit 104 may be preloaded with all sets of coefficients for a given non-linear activation function”).

Regarding claim 3 (currently amended): 
	Teder in view of Jie in further view of Patwari teaches the system of claim 1.
	Teder further teaches wherein to identify the one or more properties, the processor is further programmed to identify skewness, kurtosis and/or range boundedness of the input data (see [0141]: “In some embodiments, the obtaining the training dataset further comprises, for one or more respective training objects in the plurality of training objects, applying a transformation to the corresponding one or more images,”. Also see [0143]: “In some embodiments, the applying a transformation comprises determining a maximum degree and/or a minimum degree of transformation applied to an image. For example, in some embodiments, a degree of transformation can include a percentage used for scaling or zoom, a degree of rotation and/or skew,”. Also see [0237]: “First, a forward propagation is performed, in which input data (e.g., a corresponding image for each respective training object in a plurality of training objects in the training dataset”.).

Regarding claim 5:
	Teder in view of Jie in further view of Patwari teaches the system of claim 3.
	Teder further teaches wherein the processor is further programmed to: select a Sigmoid activation function when the one or more properties includes range boundedness or quasi-range boundedness of the input data (see [0167]: “Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail below.”).
	Even though Teder implicitly teaches wherein the processor is further programmed to: select a Sigmoid activation function when the one or more properties includes range boundedness or quasi-range boundedness of the input data as Teder explains selection of activation functions, to include sigmoid functions, on a case dependent basis, Jie explicitly teaches the case when a sigmoid function should be used (see pg. 3 subsection B “P-Sig-Ramp: Sigmoid/Tanh Function substitute with bounded domain”: “Sigmoid and Tanh activation functions are widely used in recurrent neural networks, including basic recurrent nets and recurrent nets with cell structure such as LSTMs and GRUs. For the sigmoid function, the output should be in the domain of [0, 1],”). 
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, and Patwari before him or her, to modify the system of claim 5 to include attributes selecting a sigmoid function when a range is bounded in order to increase flexibility for bounded and unbounded domains (see pg.2 section I. Introduction: “The limitation of existing studies can be illustrated as follows. First, most of existing work focus on some specific forms of parameterized activation functions rather than a more general form, or consider each component of the combination as a fixed activation function. Second, there is a lack of study on flexible activations with bounded domain such as sigmoid and tanh. Third, existing works rarely discuss the regularization on activations parameters, which have different nature from normal model parameters. In this study, we consider the activation function as a combination of a set of trainable functions following the constraints of several principles. Based on these principles, we develop two flexible activation functions that can be implemented for bounded or unbounded domain. In addition, layer-wise regularization on activation parameters is introduced to reduce the variance caused by activation functions.”)
Regarding claim 15:
	Teder teaches non-transitory storage medium storing instructions that, when executed by a processor, programs the processor to (see [0418]: “Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs.”.), 
	select an activation function for a neural network based on the one or more properties, the activation function controlling data that is fed forward in the neural network (see [0391]: “In some embodiments, each hidden neuron (e.g., in a respective hidden layer in an auxiliary neural network) is associated with an activation function that performs a function on the input data (e.g., a linear or non-linear function). ”. Also see [0237]: “Generally, training a classifier (e.g., a neural network) comprises updating the plurality of parameters (e.g., weights) for the respective classifier through backpropagation (e.g., gradient descent). First, a forward propagation is performed, in which input data (e.g., a corresponding image for each respective training object in a plurality of training objects in the training dataset) is accepted into the neural network, and an output is calculated based on the selected activation function and an initial set of parameters (e.g., weights and/or hyperparameters)”.); 
store learned data, which was learned during training, the learned data to be used in the neural network to make a prediction based on the stored data (see [0262]: “In some embodiments, the plurality of parameters for the trained neural network is stored.”. Also see [0358]: “The procedure further includes training an untrained or partially trained auxiliary neural network comprising a plurality of parameters (e.g., 500 or more parameters) by inputting each corresponding pair of images in the plurality of pairs of images as input to the untrained or partially trained auxiliary neural network thereby obtaining a corresponding similarity prediction indicating a class similarity between the first image and the second image. The procedure further comprises using at least a difference between the corresponding similarity prediction and the corresponding similarity label obtained for each pair of images in the plurality of pairs of images to update all or a subset of the plurality (e.g., 500 or more) of parameters, thereby training the auxiliary neural network to determine a class for an image of an eye corresponding to a training object.”.)
train, (see [0358]: “The procedure further includes training an untrained or partially trained auxiliary neural network comprising a plurality of parameters (e.g., 500 or more parameters) by inputting each corresponding pair of images in the plurality of pairs of images as input to the untrained or partially trained auxiliary neural network thereby obtaining a corresponding similarity prediction indicating a class similarity between the first image and the second image.”. Also see [0165]: “a corresponding first hidden layer comprising a corresponding plurality of hidden neurons, where each hidden neuron in the corresponding plurality of hidden neurons (i) is fully connected to each input in the plurality of inputs, (ii) is associated with a first activation function type, and (iii) is associated with a corresponding parameter (e.g., weight) in a plurality of parameters for the untrained or partially trained neural network, and one or more corresponding neural network outputs, where each respective neural network output in the corresponding one or more neural network outputs (i) directly or indirectly receives, as input, an output of each hidden neuron in the corresponding plurality of hidden neurons, and (ii) is associated with a second activation function type. In some such embodiments, the untrained or partially trained neural network is a fully connected network.”.)
Teder does not explicitly teach to identify one or more properties of historical data relating to input data. 
Jie, however, teaches in analogous to identify one or more properties of historical data relating to input data.  (see pg.5 section III.: “The datasets being experimented on is a combination of daily stock returns of G7 countries, which is a multi-variate time series [29], [30]. The returns of each day can be considered as an input vector to the corresponding hidden layer, while the output is one-step ahead forecast given a sequence of historical data.”.)
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder and Jie
before him or her, to modify the non-transitory storage medium of claim 15 to include attributes of having historical data and constraining the outputs within an interval of data in order to improve time series forecasting (see abstract: “It has been shown that LSTM models with proposed flexible activations P-Sig-Ramp provide significant improvements in time series forecasting,”).
	Neither Teder nor Jie explicitly teach wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data, and (ii) constrain the outputs of the nodes within an interval of the input data.
	Patwari, however, teaches wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data (see para[0068]: “The second mode (e.g., mode 2) is a low latency mode. In mode 2, each processing circuit 104 computes the output of “x” for each set of coefficients for the selected non-linear activation function. Thus, each processing circuit 104 calculates an output value for a given input data item for each of the available ranges. Each processing circuit 104 is programmed with the same sets of coefficients—e.g., the set of coefficients for each range of the selected non-linear activation function.”. Also see para [0027]: “This disclosure relates to integrated circuits (ICs) and, more particularly, to a programmable non-linear (PNL) activation engine for neural network acceleration. In accordance with the inventive arrangements described within this disclosure, example circuit architectures for a PNL activation engine are disclosed. The example circuit architectures may be used in the context of neural networks to implement a plurality of different non-linear activation functions. That is, the example circuit architectures implement a PNL activation engine that may be used, at least in part, to implement the non-linear activation function(s) of one or more nodes of one or more layers of a neural network.”) and 
	wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data (see para [0068]: “ In mode 2, each processing circuit 104 computes the output of “x” for each set of coefficients for the selected non-linear activation function. Thus, each processing circuit 104 calculates an output value for a given input data item for each of the available ranges. Each processing circuit 104 is programmed with the same sets of coefficients—e.g., the set of coefficients for each range of the selected non-linear activation function. Concurrently with the processing described, each processing circuit 104 is capable of calculating the range of the value of the received data input item. Each processing circuit 104 may then select the result that was calculated using the set of coefficients corresponding to the determined range of the value of the input data item.”. Also see [0077]: “Using the example ranges of Example 1, were an input data item with a value of −3.2 to be received, the range index would be determined to be 1 such that the set of coefficients associated with range index 1 and the range −4 to 0 would be used for the polynomial approximation performed by the particular processing circuit 104 that received the value of −3.2. Given input data items with values of −3.2, 3, 5, and -7, the respective range indexes (and corresponding sets of coefficients) used to process the respective input data items would be 1, 2, 3, and 0, respectively.”.).
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, and Patwari before him or her, to modify the non-transitory storage medium of claim 15 to include attributes of wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data in order to compute processes with a lower latency (see Patwari para [0068]: “Mode 2 has slightly lower latency than mode 1 in that each processing circuit 104 may be preloaded with all sets of coefficients for a given non-linear activation function”)
Regarding claim 16:
	Teder in view of Jie in further view of Patwari teaches the non-transitory storage medium of claim 15.
	Teder further teaches wherein the learned data comprises weights learned at each node of the neural network (see [0048]: “In instances where transfer learning is used, the untrained classifier described above is provided with additional data over and beyond that of the primary training dataset. That is, in non-limiting examples of transfer learning embodiments, the untrained classifier receives (i) the plurality of images and the measured sets of coordinates for each respective image (“primary training dataset”) and (ii) additional data. Typically, this additional data is in the form of parameters (e.g., coefficients, weights, and/or hyperparameters) that were learned from another, auxiliary training dataset.”).).
Regarding claim 18:
Teder in view of Jie in further view of Patwari teaches the non-transitory storage medium of claim 15.
Teder further teaches wherein the instructions, when executed by the processor, further cause the processor to: select a Sigmoid activation function when the one or more properties includes range boundedness or quasi-range boundedness of the Also see [0167]: “Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail below.”. 
Even though Teder implicitly teaches wherein the instructions, when executed by the processor, further cause the processor to: select a Sigmoid activation function when the one or more properties includes range boundedness or quasi-range boundedness as Teder explains using activation functions on a case dependent basis with sigmoid being one of them available, Jie explicitly teaches the case of when to use a sigmoid activation function, i.e. for range boundedness (see pg. 3 subsection B “P-Sig-Ramp: Sigmoid/Tanh Function substitute with bounded domain”: “Sigmoid and Tanh activation functions are widely used in recurrent neural networks, including basic recurrent nets and recurrent nets with cell structure such as LSTMs and GRUs. For the sigmoid function, the output should be in the domain of [0, 1],”).
Teder does not explicitly teach the use of historical data.
Jie, however, explicitly teaches historical data (see pg. 3 subsection B “P-Sig-Ramp: Sigmoid/Tanh Function substitute with bounded domain”: “Sigmoid and Tanh activation functions are widely used in recurrent neural networks, including basic recurrent nets and recurrent nets with cell structure such as LSTMs and GRUs. For the sigmoid function, the output should be in the domain of [0, 1],”. Also see pg. 5 section III. “Experiments”:  “The datasets being experimented on is a combination of daily stock returns of G7 countries, which is a multi-variate time series [29], [30]. The returns of each day can be considered as an input vector to the corresponding hidden layer, while the output is one-step ahead forecast given a sequence of historical data.”)
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, and Patwari before him or her, to modify the non-transitory storage medium of claim 18 to include attributes of having historical data in order to improve time series forecasting (see abstract: “It has been shown that LSTM models with proposed flexible activations P-Sig-Ramp provide significant improvements in time series forecasting,”. Also see abstract: “It has been shown that LSTM models with proposed flexible activations P-Sig-Ramp provide significant improvements in time series forecasting,”)

Claims 9, 11, 12, 13, and 14 are rejected under 35 U.S.C 103 as being unpatentable under Teder et al. (US20220383502A1 hereinafter, Teder) in view of Patwari et al. (US20230297824A1 hereinafter, Patwari ).
Regarding claim 9: 
Teder teaches a method of identifying and using an activation function of a neural network based on input data; (Also see [0167]: “Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail below.”),	
	identifying, by a processor, one or more properties relating to the input data (see [0013]: “In some embodiments, the method further comprises determining a class for a first training object in the plurality of training objects, by a procedure comprising obtaining a first pair of images comprising a query image of an eye corresponding to the first training object and a reference image of an eye corresponding to a second training object, other than the first training object, in the plurality of training objects, where the reference image is of a corresponding class, and inputting the query image and the reference image into a trained auxiliary neural network comprising a plurality (e.g., at least 500) parameters, thereby determining whether the query image and the reference image are of the same corresponding class.”), 
selecting, by the processor, a first activation function for a neural network based on the one or more properties, the first activation function controlling data that is fed forward in the neural network at a first layer of the neural network at which the first activation function executes (see [0391]: “In some embodiments, each hidden neuron (e.g., in a respective hidden layer in an auxiliary neural network) is associated with an activation function that performs a function on the input data (e.g., a linear or non-linear function). ”. Also see [0237]: “Generally, training a classifier (e.g., a neural network) comprises updating the plurality of parameters (e.g., weights) for the respective classifier through backpropagation (e.g., gradient descent). First, a forward propagation is performed, in which input data (e.g., a corresponding image for each respective training object in a plurality of training objects in the training dataset) is accepted into the neural network, and an output is calculated based on the selected activation function and an initial set of parameters (e.g., weights and/or hyperparameters)”. Also see [0165]: “a corresponding first hidden layer comprising a corresponding plurality of hidden neurons, where each hidden neuron in the corresponding plurality of hidden neurons (i) is fully connected to each input in the plurality of inputs, (ii) is associated with a first activation function type, and (iii) is associated with a corresponding parameter (e.g., weight) in a plurality of parameters for the untrained or partially trained neural network, and one or more corresponding neural network outputs, where each respective neural network output in the corresponding one or more neural network outputs (i) directly or indirectly receives, as input, an output of each hidden neuron in the corresponding plurality of hidden neurons, and (ii) is associated with a second activation function type. In some such embodiments, the untrained or partially trained neural network is a fully connected network.”.),
executing, by the processor, the neural network with the activation function at a fully connected dense layer of the neural network (see [0165]: “a corresponding first hidden layer comprising a corresponding plurality of hidden neurons, where each hidden neuron in the corresponding plurality of hidden neurons (i) is fully connected to each input in the plurality of inputs, (ii) is associated with a first activation function type, and (iii) is associated with a corresponding parameter (e.g., weight) in a plurality of parameters for the untrained or partially trained neural network, and one or more corresponding neural network outputs, where each respective neural network output in the corresponding one or more neural network outputs (i) directly or indirectly receives, as input, an output of each hidden neuron in the corresponding plurality of hidden neurons, and (ii) is associated with a second activation function type. In some such embodiments, the untrained or partially trained neural network is a fully connected network.”.),
	generating, by the processor, a prediction for the input data based on the executed neural network with the activation function at the fully connected dense layer of the neural network (see [0367]: “As described above, in some embodiments, the trained auxiliary neural network is trained by a procedure comprising inputting an untrained or partially trained auxiliary neural network with each respective pair of images in the plurality of pairs of images and using a difference between the similarity label for the respective pair of images and a similarity prediction generated by the untrained or partially trained auxiliary neural network to update all or a subset of the plurality of parameters for the untrained or partially trained auxiliary neural network.”. Also see [0165]: “In some such embodiments, the untrained or partially trained neural network is a fully connected network.”.) and 
	transmitting, by the processor, for display data indicating the prediction (see [0414]: “In some embodiments, the method further comprises displaying a graphical representation of an output of the auxiliary neural network for one or more learning instances and/or one or more training epochs during the training process. For instance, in some embodiments, the graphical representation comprises a display of the first image, the second image, the similarity label (e.g., similar/not similar), and the similarity prediction (e.g., similar/not similar).”.). 
selecting, by the processor, a second activation function for the neural network based on the one or more properties, the second activation function controlling data that is fed forward in the neural network at a second layer of the neural network at which the second activation function executes (see [0391]: “In some embodiments, each hidden neuron (e.g., in a respective hidden layer in an auxiliary neural network) is associated with an activation function that performs a function on the input data (e.g., a linear or non-linear function). ”. Also see [0237]: “Generally, training a classifier (e.g., a neural network) comprises updating the plurality of parameters (e.g., weights) for the respective classifier through backpropagation (e.g., gradient descent). First, a forward propagation is performed, in which input data (e.g., a corresponding image for each respective training object in a plurality of training objects in the training dataset) is accepted into the neural network, and an output is calculated based on the selected activation function and an initial set of parameters (e.g., weights and/or hyperparameters)”. Also see [0165]: “a corresponding first hidden layer comprising a corresponding plurality of hidden neurons, where each hidden neuron in the corresponding plurality of hidden neurons (i) is fully connected to each input in the plurality of inputs, (ii) is associated with a first activation function type, and (iii) is associated with a corresponding parameter (e.g., weight) in a plurality of parameters for the untrained or partially trained neural network, and one or more corresponding neural network outputs, where each respective neural network output in the corresponding one or more neural network outputs (i) directly or indirectly receives, as input, an output of each hidden neuron in the corresponding plurality of hidden neurons, and (ii) is associated with a second activation function type. In some such embodiments, the untrained or partially trained neural network is a fully connected network.”.). 
Teder does not explicitly teach wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data, and (ii) constrain the outputs of the nodes within an interval of the input data.
	Patwari, however, teaches wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data (see para[0068]: “The second mode (e.g., mode 2) is a low latency mode. In mode 2, each processing circuit 104 computes the output of “x” for each set of coefficients for the selected non-linear activation function. Thus, each processing circuit 104 calculates an output value for a given input data item for each of the available ranges. Each processing circuit 104 is programmed with the same sets of coefficients—e.g., the set of coefficients for each range of the selected non-linear activation function.”. Also see para [0027]: “This disclosure relates to integrated circuits (ICs) and, more particularly, to a programmable non-linear (PNL) activation engine for neural network acceleration. In accordance with the inventive arrangements described within this disclosure, example circuit architectures for a PNL activation engine are disclosed. The example circuit architectures may be used in the context of neural networks to implement a plurality of different non-linear activation functions. That is, the example circuit architectures implement a PNL activation engine that may be used, at least in part, to implement the non-linear activation function(s) of one or more nodes of one or more layers of a neural network.”) and 
	wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data (see para [0068]: “ In mode 2, each processing circuit 104 computes the output of “x” for each set of coefficients for the selected non-linear activation function. Thus, each processing circuit 104 calculates an output value for a given input data item for each of the available ranges. Each processing circuit 104 is programmed with the same sets of coefficients—e.g., the set of coefficients for each range of the selected non-linear activation function. Concurrently with the processing described, each processing circuit 104 is capable of calculating the range of the value of the received data input item. Each processing circuit 104 may then select the result that was calculated using the set of coefficients corresponding to the determined range of the value of the input data item.”. Also see [0077]: “Using the example ranges of Example 1, were an input data item with a value of −3.2 to be received, the range index would be determined to be 1 such that the set of coefficients associated with range index 1 and the range −4 to 0 would be used for the polynomial approximation performed by the particular processing circuit 104 that received the value of −3.2. Given input data items with values of −3.2, 3, 5, and -7, the respective range indexes (and corresponding sets of coefficients) used to process the respective input data items would be 1, 2, 3, and 0, respectively.”.).
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder and Patwari before him or her, to modify the method of claim 9 to include attributes of wherein, when the neural network is executed, the selected activation function is configured to: (i) feed forward outputs of the nodes that preserve the one or more properties of the input data in order to compute processes with a lower latency (see Patwari para [0068]: “Mode 2 has slightly lower latency than mode 1 in that each processing circuit 104 may be preloaded with all sets of coefficients for a given non-linear activation function”).

Regarding claim 11:
Teder in view of Patwari teaches the method of claim 9.
Teder further teaches wherein executing the neural network comprises: executing the neural network with the second activation function at a fully connected dense layer. (see [0165]: “a corresponding first hidden layer comprising a corresponding plurality of hidden neurons, where each hidden neuron in the corresponding plurality of hidden neurons (i) is fully connected to each input in the plurality of inputs, (ii) is associated with a first activation function type, and (iii) is associated with a corresponding parameter (e.g., weight) in a plurality of parameters for the untrained or partially trained neural network, and one or more corresponding neural network outputs, where each respective neural network output in the corresponding one or more neural network outputs (i) directly or indirectly receives, as input, an output of each hidden neuron in the corresponding plurality of hidden neurons, and (ii) is associated with a second activation function type. In some such embodiments, the untrained or partially trained neural network is a fully connected network.”. Also see [0167]: “Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail below.”).

Regarding claim 12:
	Teder in view of Patwari teaches the method of claim 9. 
	Teder further teaches wherein the second activation function comprises a Rectified Linear Unit (ReLU) activation function (see [0391]: “ Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the auxiliary neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail above (see, for example, the section entitled, “Parameters,” above).”. Also see [0395]: “In some embodiments, the auxiliary neural network is associated with one or more activation functions. In some embodiments, an activation function in the one or more activation functions is tanh, sigmoid, softmax, Gaussian, Boltzmann-weighted averaging, absolute value, linear, rectified linear unit (ReLU), bounded rectified linear, soft rectified linear, parameterized rectified linear, average, max, min, sign, square, square root, multiquadric, inverse quadratic, inverse multiquadric, polyharmonic spline, swish, mish, Gaussian error linear unit (GeLU), or thin plate spline.”. [Examiner note: i.e., emphasis added to highlight the rectified linear unit.])

Regarding claim 13:
	Teder in view of Patwari teaches the method of claim 11.
	Teder further teaches executing the neural network comprises: executing the neural network with the second activation function at a layer that is adjacent to the fully connected dense layer (see [0165]: “a corresponding first hidden layer comprising a corresponding plurality of hidden neurons, where each hidden neuron in the corresponding plurality of hidden neurons (i) is fully connected to each input in the plurality of inputs, (ii) is associated with a first activation function type, and (iii) is associated with a corresponding parameter (e.g., weight) in a plurality of parameters for the untrained or partially trained neural network, and one or more corresponding neural network outputs, where each respective neural network output in the corresponding one or more neural network outputs (i) directly or indirectly receives, as input, an output of each hidden neuron in the corresponding plurality of hidden neurons, and (ii) is associated with a second activation function type. In some such embodiments, the untrained or partially trained neural network is a fully connected network.”. Also see [0167]: “Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail below.”).
	
Regarding claim 14: 
	Teder in view of Patwari teaches the method of claim 13.
	Teder further teaches wherein the first activation function comprises a sigmoid activation function and the second activation function comprises a Rectified Linear Unit (ReLU) activation function (see [0391]: “ Selection of activation functions (e.g., a first and/or a second activation function) is dependent on the use case of the auxiliary neural network, as certain activation functions can lead to saturation at the extreme ends of a dataset (e.g., tanh and/or sigmoid functions). For instance, in some embodiments, an activation function (e.g., a first and/or a second activation function) is selected from any of the activation functions disclosed herein and described in greater detail above (see, for example, the section entitled, “Parameters,” above).”. Also see [0395]: “In some embodiments, the auxiliary neural network is associated with one or more activation functions. In some embodiments, an activation function in the one or more activation functions is tanh, sigmoid, softmax, Gaussian, Boltzmann-weighted averaging, absolute value, linear, rectified linear unit (ReLU), bounded rectified linear, soft rectified linear, parameterized rectified linear, average, max, min, sign, square, square root, multiquadric, inverse quadratic, inverse multiquadric, polyharmonic spline, swish, mish, Gaussian error linear unit (GeLU), or thin plate spline.”. [Examiner note: i.e., emphasis added to highlight the rectified linear unit and sigmoid functions.])

Claim 2 is rejected under 35 U.S.C 103 as being unpatentable under Teder et al. (US20220383502A1 hereinafter, Teder) in view of Jie et al. (“Regularized Flexible Activation Function Combinations for Deep Neural Networks” hereinafter, Jie) in further view of Patwari et al.(US20230297824A1, hereinafter Patwari) and further in view of Utasi (US20230035615A1 hereinafter, Utasi).
Regarding claim 2:
	Teder in view of Jie in further view of Patwari teaches the system of claim 1.
	Neither Teder, Jie, nor Patwari teaches wherein to select the activation function, the processor is programmed to compare the one or more properties to a threshold value or select the activation function based on whether the one or more properties exceeds the threshold value.
	Utasi, however, in analogous teaches compare the one or more properties to a threshold value (see [0028]: “According to an example embodiment, the random selection is performed based on a random number generator and a scheduler which provides a changeable decision threshold. The random number generator may provide random values in the range between 0 and 1 and the changeable decision threshold may define the threshold at which value between 0 and 1 the first or second activation function should be selected. As such, the changeable decision threshold may form the tuning factor.”.) and 
	select the activation function based on whether the one or more properties exceeds the threshold value (see [0027]: “According to an example embodiment, the probability for selecting the first or second activation function, respectively the tuning factor, based on which the probability is influenced, is changed linearly or nonlinearly in the transition phase. Depending on the respective neural network to be trained linear or nonlinear change of probability may result in a better performance of the neural network. Therefore, linear or nonlinear change of probability can be chosen depending on the neural network and depending on the task to be performed by the neural network.”. Also see [0028]: “As such, the changeable decision threshold may form the tuning factor.”.)
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, Patwari and Utasi before him or her, to modify the system of claim 2 to include attributes of selecting activation functions based on a threshold value in order to increase performance of the neural network (see Utasi at [0027]: “Depending on the respective neural network to be trained linear or nonlinear change of probability may result in a better performance of the neural network. Therefore, linear or nonlinear change of probability can be chosen depending on the neural network and depending on the task to be performed by the neural network.”.)

Claim 10 is rejected under 35 U.S.C 103 as being unpatentable under Teder et al. (US20220383502A1 hereinafter referred to as Teder) in view of Patwari et al. (US20230297824A1, hereinafter Patwari) and further in view of Lee et al. (US20230252283A1 hereinafter referred to as Lee). 
Regarding claim 10:
Teder in view of Patwari teaches the method of claim 9.
Neither Teder nor Patwari explicitly teaches wherein the first layer and the second layer are adjacent to one another.
Lee, however, further teaches wherein the first layer and the second layer are adjacent to one another (see abstract: “A processor-implemented method with a neural network includes: generating a first intermediate vector by applying a first activation function to first nodes in a first intermediate layer adjacent to an input layer among intermediate layers of the neural network; transferring the first intermediate vector to second nodes in a second intermediate layer adjacent to an output layer among the intermediate layers; generating a second intermediate vector by applying a second activation function to the second nodes; and applying the second intermediate vector to an output layer of the neural network.”.)
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Patwari, and Lee before him or her, to modify the method of claim 10 to include attributes of layers being adjacent to another in order to improve accuracy of the neural network (see Lee at [0130]: “Even for a general neural network, the training device of one or more embodiments may improve the accuracy of the neural network by performing fine-tuning by applying a second activation function that suggests non-linearity for nodes included in a layer adjacent to an output layer without newly training the neural network for a task.”.)

Claims 4 and 17 are rejected under 35 U.S.C 103 as being unpatentable under Teder et al. (US20220383502A1 hereinafter referred to as Teder) in view of Jie et al. (“Regularized Flexible Activation Function Combinations for Deep Neural Networks” hereinafter referred to as Jie) in further view of Patwari et al. (US20230297824A1, hereinafter Patwari) and further in view of Zhao et al. (US20200250527A1 hereinafter referred to as Zhao). 
Regarding claim 4: 
Teder in view of Jie in further view of Patwari teaches the system of claim 3.
Neither Teder, Jie, nor Patwari explicitly teaches to select a Rectified Linear Unit (ReLU) activation function when the one or more properties include a skew in the input data.
Zhao, however, in analogous teaches to select a Rectified Linear Unit (ReLU) activation function when the one or more properties include a skew in the input data (see [0057]: “Both these cases represent a skewed binary data set where a fraction of datapoints are positively labeled. The scenario where the active learning process starts with zero labeled datapoints can be considered as a special case, where L0 is empty. For this example, a neural network is used for the classifier. In each evaluation, the same neural network architecture was used: two hidden layers with 20 nodes on each layer. Each layer is fully connected and uses rectified linear units (ReLU).”).
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, and Zhao
before him or her, to modify the system of claim 4 to include attributes of using a rectified linear unit activation function on skewed input data in order to optimize cross-entropy loss (see Zhao [0057]: “In each evaluation, the same neural network architecture was used: two hidden layers with 20 nodes on each layer. Each layer is fully connected and uses rectified linear units (ReLU). The network is set up to optimize the cross-entropy loss.”. Also see [0020]: “The present disclosure provides computing systems and methods directed to active learning and may provide advantages or improvements to active learning applications for skewed data sets.”).

Regarding claim 17: 
	Teder in view of Jie in view of Patwari teaches the non-transitory storage medium of claim 15. 
	Neither Teder nor Jie explicitly teaches to select a Rectified Linear Unit (ReLU) activation function when the one or more properties include a skew in the historical data.
	Zhao, however, teaches to select a Rectified Linear Unit (ReLU) activation function when the one or more properties include a skew in the (see [0057]: “Both these cases represent a skewed binary data set where a fraction of datapoints are positively labeled. The scenario where the active learning process starts with zero labeled datapoints can be considered as a special case, where L0 is empty. For this example, a neural network is used for the classifier. In each evaluation, the same neural network architecture was used: two hidden layers with 20 nodes on each layer. Each layer is fully connected and uses rectified linear units (ReLU).”).
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, and Zhao
before him or her, to modify the non-transitory storage medium of claim 17 to include attributes of using a rectified linear unit activation function on skewed input data in order to optimize cross-entropy loss (see Zhao [0057]: “In each evaluation, the same neural network architecture was used: two hidden layers with 20 nodes on each layer. Each layer is fully connected and uses rectified linear units (ReLU). The network is set up to optimize the cross-entropy loss.”. Also see [0020]: “The present disclosure provides computing systems and methods directed to active learning and may provide advantages or improvements to active learning applications for skewed data sets.”).
	Neither Teder, Patwari, nor Zhao explicitly mention historical data. 
	Jie, however, does explicitly teach historical data (see pg. 5 section III. “Experiments”:  “The datasets being experimented on is a combination of daily stock returns of G7 countries, which is a multi-variate time series [29], [30]. The returns of each day can be considered as an input vector to the corresponding hidden layer, while the output is one-step ahead forecast given a sequence of historical data.”)
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, Patwari and Zhao before him or her, to modify the system of non-transitory storage medium of claim 17 to include attributes of having historical data in order to improve time series forecasting (see Jie at abstract: “It has been shown that LSTM models with proposed flexible activations P-Sig-Ramp provide significant improvements in time series forecasting,”.)

Claims 6-8, 19 and 20 are rejected under 35 U.S.C 103 as being unpatentable under Teder et al. (US20220383502A1 hereinafter referred to as Teder) in view of Jie et al. (“Regularized Flexible Activation Function Combinations for Deep Neural Networks” hereinafter referred to as Jie) in further view of Lee et al. (US20230252283A1 hereinafter referred to as Lee). 
Regarding claim 6: 
Teder in view of Jie in view of Patwari teaches the system of claim 3.
	Neither Teder, Jie, nor Patwari explicitly teaches wherein the processor is further programmed to select a second activation function adjacent to be executed in a layer of the neural network adjacent to the selected activation function at the fully connected layer.
	Lee, however, teaches in analogous wherein the processor is further programmed to select a second activation function adjacent to be executed in a layer of the neural network adjacent to the selected activation function at the fully connected layer (see abstract: “A processor-implemented method with a neural network includes: generating a first intermediate vector by applying a first activation function to first nodes in a first intermediate layer adjacent to an input layer among intermediate layers of the neural network; transferring the first intermediate vector to second nodes in a second intermediate layer adjacent to an output layer among the intermediate layers; generating a second intermediate vector by applying a second activation function to the second nodes; and applying the second intermediate vector to an output layer of the neural network.”. Also see claim 12: “the additional nodes and the intermediate nodes are fully connected.”)
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Patwari, Jie, and Lee before him or her, to modify the system of claim 6 to include attributes of layers being adjacent, with different activation functions, to another in order to improve accuracy of the neural network (see Lee [0130]: “Even for a general neural network, the training device of one or more embodiments may improve the accuracy of the neural network by performing fine-tuning by applying a second activation function that suggests non-linearity for nodes included in a layer adjacent to an output layer without newly training the neural network for a task.”.)

Regarding claim 19:
Claim 19 recites analogous limitations to claim 6 and is therefore rejected on the same grounds as claim 6.

Regarding claim 7: 
Teder in view of Jie in further view of Lee teaches the system of claim 6.
	Lee further teaches wherein the selected activation function and the second activation function are different from one another (see claim 10: “extracting a second result value by applying a second activation function different from the first activation function to additional nodes connected to intermediate nodes in one or more of the intermediate layers”).
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, and Lee
before him or her, to modify the system of claim 7 to include attributes of different activation functions in order to improve accuracy of the neural network (see [0130]: “Even for a general neural network, the training device of one or more embodiments may improve the accuracy of the neural network by performing fine-tuning by applying a second activation function that suggests non-linearity for nodes included in a layer adjacent to an output layer without newly training the neural network for a task.”.). 

Regarding claim 20: 
Claim 20 recites analogous limitations to claim 7 and therefore is rejected on the same grounds as claim 7.

Regarding claim 8: 
Teder in view of Jie in further view of Patwari in further view of Lee teaches the system of claim 6.
Teder does not explicitly teach wherein the input data comprises a time series data values.
Jie further teaches wherein the input data comprises a time series data values (see pg.5 section III.: “The datasets being experimented on is a combination of daily stock returns of G7 countries, which is a multi-variate time series [29], [30]. The returns of each day can be considered as an input vector to the corresponding hidden layer, while the output is one-step ahead forecast given a sequence of historical data.”.)
Before the effective filing date of the claimed invention, it would have been
obvious to one of ordinary skill in the art, having the teachings of Teder, Jie, Lee, and Patwari before him or her, to modify the system of claim 8 to include attributes of having historical data in order to improve time series forecasting (see Jie at abstract: “It has been shown that LSTM models with proposed flexible activations P-Sig-Ramp provide significant improvements in time series forecasting,”.).





Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew A Bracero whose telephone number is (571)270-0592. The examiner can normally be reached Monday - Thursday 7:30a.m. - 5:00 p.m. ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached Monday - Thursday 7:30a.m. - 5:00 p.m. ET at (571)270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW BRACERO/Examiner, Art Unit 2126                                                                                                                                                                                                        
/VAN C MANG/Primary Examiner, Art Unit 2126
Read full office action
MAPPING ACTIVATION FUNCTIONS TO DATA FOR DEEP LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MAPPING ACTIVATION FUNCTIONS TO DATA FOR DEEP LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email