Office Action Analysis: 17959232 — NEURAL NETWORKS WITH TRANSFORMED ACTIVATION FUNCTION LAYERS

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-25 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 2nd, 2024 and May 23rd, 2024 was filed. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.

Claim 1Step 1: The claim recites a method; therefore, it is directed to the statutory category of
processes.
Step 2A Prong 1: The claim recites, inter alia:
[A]nd processing the… input…to generate a… output for the… input…: This limitation is viewed as a mental process since processing an input and generating an output which is mentally performable. 
[G]enerating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer: This limitation encompasses a mathematical concept since it deals with using a math function to derive the activation input. See instant case specification that states: “In some implementations, the layer applies an affine operation, e.g., a convolution or a matrix multiplication optionally filed by a summation with a bias, to the layer input to generate the activation input.”
[T]ransforming the activation input using one or more input transformation constants to generate a transformed activation input: This limitation recites a mathematical concept dealing with using a math equation to generate a transformed activation input that will be used for subsequent processing. See instant case specification which states, “The system then generates the transformed activation output by multiplying the shifted initial activation output by an output scale constant y.”
[A]pplying the element-wise activation function to the transformed activation input to generate an initial activation output: This limitation recites a mathematical concept by using an activation function to derive the initial activation output. Additionally, the process generates an initial activation output by shifting it using addition where the specification in the instant case states that “…the system generates a shifted initial activation output by adding an output shift constant 8 to the activation output.”
[T]ransforming the initial activation output using one or more output transformation constants to generate a transformed activation output, wherein the one or more input transformation constants and the one or more output transformation constants are based on properties of the neural network when the neural network is initialized prior to training the neural network: This limitation recites a mathematical concept dealing with using mathematical equations to derive the transformed activation output. See the specification of the instant case which recites that “The system then generates the transformed activation output by multiplying the shifted initial activation output by an output scale constant y.”
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
[R]eceiving a network input… receiving a layer input for the transformed activation function layer: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
…using a neural network that comprises a plurality of neural network layers arranged as a directed graph…the plurality of neural network layers comprising a plurality of transformed activation function layers, and wherein processing the network input comprises, for each transformed activation function layer…: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
[A]nd providing the transformed activation output as a layer output for the transformed activation function layer: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
input network… output network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
[R]eceiving a network input… receiving a layer input for the transformed activation function layer: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
…using a neural network that comprises a plurality of neural network layers arranged as a directed graph…the plurality of neural network layers comprising a plurality of transformed activation function layers, and wherein processing the network input comprises, for each transformed activation function layer…: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
[A]nd providing the transformed activation output as a layer output for the transformed activation function layer: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
input network… output network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts for transforming neural network activation inputs and outputs). The claim merely describes a process of applying known mathematical operations (transformations, scaling, shifting, and activation functions) to input data within a neural network and performing standard data processing steps (receiving network inputs, processing layer inputs, and outputting layer results). The recitation of a neural network comprising multiple layers arranged as a directed graph merely indicates a technological environment in which the abstract ideas are applied, without improving the functioning of a computer or neural network itself.
Therefore, the claim as a whole remains focused on the abstract idea and fails Step 2B of the eligibility
analysis.

Claim 2Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[G]enerating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer comprises: applying an affine operation to the layer input: This limitation is seen as a mathematical concept because it involves applying a math operation to the input to generate an activation input. See the specification of the instant case which states that “the layer applies an affine operation, e.g., a convolution or a matrix multiplication optionally filed by a summation with a bias, to the layer input to generate the activation input.”
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply
an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 3Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[G]enerating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer comprises: using the layer input as the activation input: This limitation encompasses a mathematical concept since it deals with using a math function to derive the activation input. See the specification of the instant case that states: “In some implementations, the layer applies an affine operation, e.g., a convolution or a matrix multiplication optionally filed by a summation with a bias, to the layer input to generate the activation input.”
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply
an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 4Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[T]ransforming the activation input using one or more input transformation constants to generate a transformed activation input comprises: generating an initial transformed activation input by multiplying the activation input by an input scale constant: This limitation is a mathematical concept dealing with the multiplication of two different numbers.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.


Claim 5Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[T]ransforming the activation input using one or more input transformation constants to generate a transformed activation input comprises: generating the transformed activation input by adding an input shift constant to the initial transformed activation input: This limitation is a mathematical concept because it deals with adding a number to the transformed activation input to shift the constant.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 6Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
(i) is an identity operation when the given element is greater than or equal to zero: This limitation deals with the identification of when an element is greater than or equal to a value, which recites a mental process.
the activation function is a leaky RELU activation function that, for a given element… and (ii) multiplies the given element by a slope value…: This limitation encompasses a mathematical concept because the activation function deals with multiplying two different numbers to each other.
when the given element is less than zero, and wherein an output scale constant is defined by the slope value: This limitation is a mental process that deals with the identification of when an element is less than a value. Additionally, defining an output scale based on the value of the slope can be performed mentally. 
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 7Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the output scale constant is equal to a square root of a ratio between (i) 2 and (ii) a sum of one and a square of the slope value: This limitation encompasses a mathematical concept because it deals with taking the square root of a number, and adding it with the square of a different number.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 8Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[T]ransforming the initial activation output using one or more output transformation constants to generate a transformed activation output comprises: generating a shifted initial activation output by adding an output shift constant to the activation output: This limitation recites a mathematical concept dealing with using mathematical equations of adding two different values to derive the transformed activation output.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 9Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[T]ransforming the initial activation output using one or more output transformation constants to generate a transformed activation output comprises: generating the transformed activation output by multiplying the shifted initial activation output by an output scale constant: This limitation recites a mathematical concept dealing multiplying two different values to derive the transformed activation output.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 10Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the activation function is a smooth activation function and wherein the output scale constant is based on a value of the shifted initial activation output for an element that has a value sampled from a noise distribution: This limitation encompasses a mathematical concept because it deals with smoothening an activation function that is based off a mathematical function and calculating derivatives, and sampling from a distribution.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 11Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[A]pplying a respective normalized weight to each of the respective layer outputs to generate a respective weighted layer output, wherein a sum of the squares of the respective normalized weights is equal to one: This limitation is a mathematical concept because it involves using the addition operation to add numbers to generate a weighted layer output.
and generating a layer output for the normalized summation layer by summing the respective weighted layer outputs: This limitation is a mathematical concept because it involves using the addition operation to add numbers to generate a layer output.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
the plurality of neural network layers comprise one or more normalized summation layers, and wherein processing the network input comprises, for each normalized summation layer:  The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
[R]eceiving for each of a plurality of neural network layers that are connected to the normalized summation layer by an incoming edge in the directed graph…: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
a respective layer output generated by the neural network layer during the processing of the network input: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
the plurality of neural network layers comprise one or more normalized summation layers, and wherein processing the network input comprises, for each normalized summation layer: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
[R]eceiving for each of a plurality of neural network layers that are connected to the normalized summation layer by an incoming edge in the directed graph…: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
a respective layer output generated by the neural network layer during the processing of the network input: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 12Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
[G]enerating, from the layer input, an activation input to an element- wise activation function for the transformed activation layer comprises: computing a convolution between a filter bank tensor for the layer and the layer input: This limitation is a mathematical concept dealing with feature extraction through convolution and linear transformations within tensor spaces.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
for one or more of the plurality of transformed activation function layers…: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
for one or more of the plurality of transformed activation function layers…: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 13Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
initializing the filter bank tensor for the layer using Delta initialization: This limitation is a mathematical concept because Delta initialization is the application of the Dirac delta function’s sifting property to define a specific initial state.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
for each of the one or more transformed activation function layers, prior to training the neural network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
for each of the one or more transformed activation function layers, prior to training the neural network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 14Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the Delta initialization uses an entry-wise Gaussian distribution: This limitation is a mathematical concept because it applies the Gaussian distribution which is a mathematical function for bell-shaped probability distribution.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 15Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the Delta initialization uses a scaled-corrected uniform orthogonal (SUO) distribution: This limitation is a mathematical concept because it deals with rescaling matrices to preserve the signal’s variance.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 16Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the one or more input transformation constants and the one or more output transformation constants are also based on a hyperparameter that represents a degree of nonlinearity of the operations performed by the neural network at initialization: This limitation is a mathematical concept because it uses the hyperparameter to adjust the transformation constants based on the network’s non-linear functions.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 17Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the one or more input transformation constants and the one or more output transformation constants are based on the hyperparameter and an estimate of a maximal slope function of the neural network at initialization:  This limitation recites a mathematical function because it applies a specific derivative-based metric (the maximal slope) to determine the constant values.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 18Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the one or more input transformation constants and the one or more output transformation constants for each of the layers are selected such one or more constraints that are based on values of local C maps, local Q maps, or both for the plurality of layers in the neural network are satisfied: This limitation recites a mathematical concept because it deals with the Q and C maps which are mathematical tools used to track how data changes as it moves through the layers.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 19Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the one or more constraints are based on values of local C maps, wherein a local C map is a function that characterizes how well a cosine similarity function is preserved between the input and the output of a neural network layer, wherein a global C map comprises a composition local C maps for the plurality of neural network layers… based on a constraint on the global C map that represents preservation of the cosine similarity function by the neural network: This limitation encompasses a mathematical concept because C map is a mathematical function used to measure orthogonality and separation, and cosine similarity is a math function used to see the degree of similarity between two vectors.
and wherein the one or more input transformation constants and the one or more output transformation constants are selected: This limitation is seen as a mental process dealing with selecting constant values.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 20Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
the one or more constraints are based on values of local Q maps, wherein a local Q map is a function that characterizes a change in a squared magnitude of the element-wise activation function between the input and the output of a neural network layer… based on a constraint on the local Q map: This limitation is a mathematical concept since the Q map is a mathematical function that track the variance of the data distribution as it passes through a layer. Additionally, it calculates the change between two different numbers to ensure that the energy of the data remains constant across the entire network.
and wherein the one or more input transformation constants and the one or more output transformation constants are selected: This limitation is seen as a mental concept dealing with selecting a number of constants. 
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 21Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
at least one of the constraints is based on the hyperparameter and the estimate of the maximal slope function: This limitation is a mathematical concept because it uses the maximal slope function (calculating derivatives) to guarantee that constants will scale the data properly.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 22Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
applying normalization to the network input to generate a normalized input; and providing the normalized input as a layer input for one or more initial neural networks of the neural network: This limitation describes a ma thematical concept because it applies a normalization function to the data to rescale the values into standardized numbers.
processing the network input using the neural network comprises…: This limitation describes a mental process because it deals with looking at the network input in the neural network.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 23Step 1: A process, as above.
Step 2A Prong 1: The claim recites, inter alia:
determining a gradient… and determining an update…: This limitation is seen as a mental process as it involves the determination of an update and a gradient, which can be performed in the human mind.
…with respect to a set of parameters of the neural network of a loss function for the training of the neural network that measures a quality of the network output relative to the target network output: This limitation is viewed as a mathematical concept because it deals with using the loss function to minimize the error of the prediction model.
…to the parameters of the neural network based at least on the gradient: This limitation is viewed as a mathematical concept because it deals with calculating the gradient descent which uses a loss function and calculates derivatives.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
the network input is received during training of the neural network and wherein the method further comprises: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
obtaining a target network output for the network input: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
the network input is received during training of the neural network and wherein the method further comprises: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
obtaining a target network output for the network input: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 24Step 1: The claim recites an apparatus; therefore, it is directed to the statutory category of
apparatus.
Step 2A Prong 1: The claim recites, inter alia:
[A]nd processing the… input…to generate a… output for the… input…: This limitation is viewed as a mental process since processing a input and generating an output based on the input, which is mentally performable. 
[G]enerating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer: This limitation encompasses a mathematical concept since it deals with using a math function to derive the activation input. See instant case specification that states: “In some implementations, the layer applies an affine operation, e.g., a convolution or a matrix multiplication optionally filed by a summation with a bias, to the layer input to generate the activation input.”
[T]ransforming the activation input using one or more input transformation constants to generate a transformed activation input: This limitation recites a mathematical concept dealing with using a math equation to generate a transformed activation input that will be used for subsequent processing. See instant case specification which states, “The system then generates the transformed activation output by multiplying the shifted initial activation output by an output scale constant y.”
[A]pplying the element-wise activation function to the transformed activation input to generate an initial activation output: This limitation recites a mathematical concept by using an activation function to derive the initial activation output. Additionally, the process generates an initial activation output by shifting it using addition where the specification in the instant case states that “…the system generates a shifted initial activation output by adding an output shift constant 8 to the activation output.”
[T]ransforming the initial activation output using one or more output transformation constants to generate a transformed activation output, wherein the one or more input transformation constants and the one or more output transformation constants are based on properties of the neural network when the neural network is initialized prior to training the neural network: This limitation recites a mathematical concept dealing with using mathematical equations to derive the transformed activation output. See the specification of the instant case which recites that “The system then generates the transformed activation output by multiplying the shifted initial activation output by an output scale constant y.”
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
[R]eceiving a network input… receiving a layer input for the transformed activation function layer: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
…using a neural network that comprises a plurality of neural network layers arranged as a directed graph…the plurality of neural network layers comprising a plurality of transformed activation function layers, and wherein processing the network input comprises, for each transformed activation function layer…: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
[A]nd providing the transformed activation output as a layer output for the transformed activation function layer: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
input network… output network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
[R]eceiving a network input… receiving a layer input for the transformed activation function layer: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
…using a neural network that comprises a plurality of neural network layers arranged as a directed graph…the plurality of neural network layers comprising a plurality of transformed activation function layers, and wherein processing the network input comprises, for each transformed activation function layer…: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
[A]nd providing the transformed activation output as a layer output for the transformed activation function layer: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
input network… output network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts for transforming neural network activation inputs and outputs). The claim merely describes a process of applying known mathematical operations (transformations, scaling, shifting, and activation functions) to input data within a neural network and performing standard data processing steps (receiving network inputs, processing layer inputs, and outputting layer results) using a generic computer. The recitation of a neural network comprising multiple layers arranged as a directed graph merely indicates a technological environment in which the abstract ideas are applied, without improving the functioning of a computer or neural network itself. 
Therefore, the claim as a whole remains focused on the abstract idea and fails Step 2B of the eligibility analysis.

Claim 25Step 1: The claim recites a non-transitory computer medium; therefore, it is directed to the statutory
category of manufacture.
Step 2A Prong 1: The claim recites, inter alia:
[A]nd processing the…input…to generate a… output for the… input…: This limitation is viewed as a mental process since processing a input and generating an output based on the input, which is mentally performable. 
[G]enerating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer: This limitation encompasses a mathematical concept since it deals with using a math function to derive the activation input. See instant case specification that states: “In some implementations, the layer applies an affine operation, e.g., a convolution or a matrix multiplication optionally filed by a summation with a bias, to the layer input to generate the activation input.”
[T]ransforming the activation input using one or more input transformation constants to generate a transformed activation input: This limitation recites a mathematical concept dealing with using a math equation to generate a transformed activation input that will be used for subsequent processing. See instant case specification which states, “The system then generates the transformed activation output by multiplying the shifted initial activation output by an output scale constant y.”
[A]pplying the element-wise activation function to the transformed activation input to generate an initial activation output: This limitation recites a mathematical concept by using an activation function to derive the initial activation output. Additionally, the process generates an initial activation output by shifting it using addition where the specification in the instant case states that “…the system generates a shifted initial activation output by adding an output shift constant 8 to the activation output.”
[T]ransforming the initial activation output using one or more output transformation constants to generate a transformed activation output, wherein the one or more input transformation constants and the one or more output transformation constants are based on properties of the neural network when the neural network is initialized prior to training the neural network: This limitation recites a mathematical concept dealing with using mathematical equations to derive the transformed activation output. See the specification of the instant case which recites that “The system then generates the transformed activation output by multiplying the shifted initial activation output by an output scale constant y.”
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
[R]eceiving a network input… receiving a layer input for the transformed activation function layer: Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).
…using a neural network that comprises a plurality of neural network layers arranged as a directed graph…the plurality of neural network layers comprising a plurality of transformed activation function layers, and wherein processing the network input comprises, for each transformed activation function layer…: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).
[A]nd providing the transformed activation output as a layer output for the transformed activation function layer: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
input network… output network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
[R]eceiving a network input… receiving a layer input for the transformed activation function layer: The additional element of “receiving” does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of receiving steps amounts to no more than mere data gathering. This element amounts to receiving data over a network and are well-understood, routine, conventional activity. See MPEP 2106.05(d), subsection II (i). This cannot provide an inventive concept.
…using a neural network that comprises a plurality of neural network layers arranged as a directed graph…the plurality of neural network layers comprising a plurality of transformed activation function layers, and wherein processing the network input comprises, for each transformed activation function layer…: The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself which cannot provide inventive concept (MPEP 2106.05(h)).
[A]nd providing the transformed activation output as a layer output for the transformed activation function layer: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
input network… output network: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts for transforming neural network activation inputs and outputs). The claim merely describes a process of applying known mathematical operations (transformations, scaling, shifting, and activation functions) to input data within a neural network and performing standard data processing steps (receiving network inputs, processing layer inputs, and outputting layer results) using a generic computer. The recitation of a neural network comprising multiple layers arranged as a directed graph merely indicates a technological environment in which the abstract ideas are applied, without improving the functioning of a computer or neural network itself. 
Therefore, the claim as a whole remains focused on the abstract idea and fails Step 2B of the eligibility analysis.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-5, 8-10, 16-17, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Trentin (“Networks with trainable amplitude of activation functions”, 2001) in view of Parhi (“The Role of Neural Network Activation Functions”, 2020).

Regarding claim 1,
Trentin teaches [a] method performed by one or more computers, the method comprising (“This paper introduces novel algorithms to learn the amplitudes of nonlinear activation functions in layered networks.”, Page 5 Under Equation 21, (“…a recursive function x() can be readily implemented in any modern computer language)…”
The reference describes algorithms designed to learn activation amplitudes within layered networks, which is implemented in computers for the algorithm to run. These algorithms represent a sequence of computational steps (a method) performed by one or more processors to process neural network data.):
receiving a network input; and processing the network input using a neural network that comprises a plurality of neural network layers arranged as a directed graph to generate a network output for the network input (Page 4 Section 2.1, “Let us consider the training set T = (x, y), which specifies the relationship between training input vectors and the corresponding desired (target) output vectors. An on-line”, Page 4 Section 2, “A feedforward network with L layers is considered. Individual layers are denoted by L0, L1, …, LL, where L0 is the input layer (which is not counted, since its units act only as placeholders) and L1 is the output layer.”
The reference describes a feedforward network where data moves from an input layer through a sequence of layers to produce a target output. This structure defines a directed graph of layers that receives and processes input vectors to generate specific network outputs.), 
the plurality of neural network layers comprising a plurality of transformed activation function layers, and wherein processing the network input comprises (Page 4 Section 2, “An assumption is made that the activation function associated with the i-th unit in layer L_l can be either in the form
 
    PNG
    media_image1.png
    47
    411
    media_image1.png
    Greyscale

which could be the case of sigmoids or Gaussians with learnable amplitude λ_i,l and an offset (shift)”
Trentin discloses a neural network architecture where multiple layers utilize a “transformed” activation function defined in Equation 3, which modifies standard sigmoids or Gaussians. This explicitly establishes that the plurality of layers process input using the activation function that’s controlled by a scale (amplitude) and a shift (offset). Each layer performs Equation 3 becomes a “transformed activation function layer” because equation 3 transforms the activation function ~f(a) by multiplying it by the amplitude and adding it to the offset.), 
for each transformed activation function layer: receiving a layer input for the transformed activation function layer (“Individual layers are denoted by L0, L1, …, LL, where L0 is the input layer… and L1 is the output layer... The actual input to the latter unit is denoted by a_(i,l), and the corresponding output is o_(i,l). With this notation we can write

    PNG
    media_image2.png
    50
    412
    media_image2.png
    Greyscale
”
Trentin’s disclosure specifies that each unit within a layer receives an “actual input” (a_i,l), which is calculated as the sum of output from the preceding layer at (L_(l – 1)). Equation 1 defines the specific mechanism by which data is passed into a transformed activation unit from the previous layer. This process repeats ever layer in the directed graph, ensuring that each transformed activation function layer receives the necessary input to perform the transformation.); 
generating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer (See Equation 1 on Page 4,
    PNG
    media_image3.png
    65
    419
    media_image3.png
    Greyscale
, Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):

    PNG
    media_image4.png
    51
    418
    media_image4.png
    Greyscale

where b_i,2 denotes a unit-specific adaptive bias and [theta] is the variable smoothness determining the shape (slope) of the sigmoid.”
Equation 1 describes the generation of the initial unit input (a_i,l) through a weighted sum of outputs from the previous layer. Layer L1 finishes its work and hands the data to Layer L2 as a weighted sum (a_i,2) which is used as input for Equation 54. Equation 54 further refines this by showing how input is processed (using bias and smoothness) to specifically generate the final value that enters the exponential/sigmoidal activation function. Equation 54 declares the “activation input” from (a_i,2 – B_i,2) that the sigmoid (element-wise activation function) will process.);
transforming the activation input using one or more input transformation constants to generate a transformed activation input (Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):

    PNG
    media_image4.png
    51
    418
    media_image4.png
    Greyscale

where b_i,2 denotes a unit-specific adaptive bias and [theta] is the variable smoothness determining the shape (slope) of the sigmoid.”
Equation 54 shows the activation input (a_i,2) being modified by two specific constants: a bias (Beta), which shifts the input, and a smoothness (theta), which scales it. This pre-processing of the input variable before it reaches the core sigmoid function directly corresponds to generating a transformed activation input using input transformation constants. Equation 1 demonstrates that layer L1 finishes its work and hands the data to Layer L2 as a weighted sum (a_i,2) which is used as input for Equation 54 known as the “activation input”. The term a_i,2 is used to calculate (a_i,2 – Beta_i,2) / Theta in Equation 54, which is used to generate the “transformed activation input” using the input transformation constants of Beta and Theta.);  
transforming the initial activation output using one or more output transformation constants to generate a transformed activation output (Page 4 Section 2, “An assumption is made that the activation function associated with the i-th unit in layer Ll can be either in the form 
    PNG
    media_image1.png
    47
    411
    media_image1.png
    Greyscale

which could be the case of sigmoids or Gaussians with learnable amplitude λ_i,l and an offset (shift)”
The disclosure defines the final layer output f_i,l (a_i,l) as the result of taking the initial activation of ~f_i,l(a_i,l) and transforming it using an amplitude λ (output scale) and an offset (output shift). This sequence demonstrates the use of output transformation constants to generate the final “transformation activation output” from the initial function result.), 
wherein the one or more input transformation constants and the one or more output transformation constants are based on properties of the neural network when the neural network is initialized prior to training the neural network (Page 9 Section 3.1, “Experiments with the student net were then repeated starting from five different random initializations of the weights… for a total of 25 experiments for each of the training techniques”, Page 1 Introduction, “In practice… weights are randomly initialized in a uniform manner over a small interval (as is common practice)”, Page 12 Paragraph 1 Under Figure 3, “To reach such a result, random initialization of individual amplitudes was necessary, as commonly done for the weights”
Trentin establishes that transformation constants, such as individual amplitudes and weights, are assigned values via ‘random initialization’ before the training process begins. The paper further teaches that this initialization is ‘necessary’ for the network to function.); 
and providing the transformed activation output as a layer output for the transformed activation function layer (Page 4 Section 2, “The actual input to the latter unit is denoted by a_i,l, and the corresponding output is o_i,l.”
Trentin defines the variable o_i,l as the “corresponding output” of the unit, which represents the final result of the transformed activation function f_i,l after performing a forward pass to the subsequent layers in the network.).
Trentin does not teach applying the element-wise activation function to the transformed activation input to generate an initial activation output.
Parhi, in the same field of endeavor, teaches applying the element-wise activation function to the transformed activation input to generate an initial activation output (Page 1 Introduction, “In this paper we show how regularization in the finite dimensional space of neural network parameters is actually the same as regularization in the infinite-dimensional space of functions….

    PNG
    media_image5.png
    68
    427
    media_image5.png
    Greyscale

where ρ : R → R is a fixed activation function, K is the width of the network, for k = 1,...,K, vk, wk ∈ R, wk w_k= 0 are the weights and bk ∈ R are the first layer biases, and c(·) is a “generalized bias”1 term in the last layer.”
In Parhi, the term w_k*x – b_k corresponds to the transformed activation input, the function p(*) corresponds to the element-wise activation function, and the resulting value p(w_k*x – b_k) corresponds to the initial activation output for each neuron in the network.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Trentin’s method of learning layer-specific activation amplitudes with Parhi’s use of element-wise activation functions applied to transformed activation inputs in order to enhance the network’s learning and prediction capability (Introduction of Parhi).

Regarding claim 2,
Trentin teaches generating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer comprises: applying an affine operation to the layer input (See Equation 1 on Page 4,
    PNG
    media_image3.png
    65
    419
    media_image3.png
    Greyscale
, Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):

    PNG
    media_image4.png
    51
    418
    media_image4.png
    Greyscale

where b_i,2 denotes a unit-specific adaptive bias and [theta] is the variable smoothness determining the shape (slope) of the sigmoid.”
Equation 1 describes the generation of the initial unit input (a_i,l) through a weighted sum of outputs from the previous layer. The affine operation is the weighted sum of the layers from Equation 1.  Layer L1 finishes its work and hands the data to Layer L2 as a weighted sum (a_i,2) which is used as input for Equation 54. Equation 54 further refines this by showing how input is processed (using bias and smoothness) to specifically generate the final value that enters the exponential/sigmoidal activation function. Equation 54 declares the “activation input” from (a_i,2 – B_i,2) that the sigmoid (element-wise activation function) will process.).

	Regarding claim 3, 
Trentin teaches generating, from the layer input, an activation input to an element-wise activation function for the transformed activation layer comprises: using the layer input as the activation input (Page 4 Section 2, “The actual input to the latter unit is denoted by a_i,l, and the corresponding output is o_i,l. With this notation we can write

    PNG
    media_image6.png
    59
    424
    media_image6.png
    Greyscale
”
The term a_i,l is the input to the unit, which is the weighted sum of outputs from the previous layer used as direct input  to the function f_i,l(a_i,l)).

Regarding claim 4,
Trentin teaches transforming the activation input using one or more input transformation constants to generate a transformed activation input comprises: generating an initial transformed activation input by multiplying the activation input by an input scale constant (Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):

    PNG
    media_image4.png
    51
    418
    media_image4.png
    Greyscale

where b_i,2 denotes a unit-specific adaptive bias and [theta] is the variable smoothness determining the shape (slope) of the sigmoid.”
Equation 54 shows the activation input (a_i,2) being modified by two specific constants: a bias (Beta), which shifts the input, and a smoothness (theta), which scales it. This pre-processing of the input variable before it reaches the core sigmoid function directly corresponds to generating a transformed activation input using input transformation constants. Equation 1 demonstrates that layer L1 finishes its work and hands the data to Layer L2 as a weighted sum (a_i,2) which is used as input for Equation 54 known as the “activation input”. The term a_i,2 is used to calculate (a_i,2 – Beta_i,2) * 1/ Theta in Equation 54, which is used to generate the “transformed activation input” using the input transformation constants of Beta and Theta.).

Regarding claim 5,
Trentin teaches transforming the activation input using one or more input transformation constants to generate a transformed activation input comprises: generating the transformed activation input by adding an input shift constant to the initial transformed activation input (Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):

    PNG
    media_image4.png
    51
    418
    media_image4.png
    Greyscale

where b_i,2 denotes a unit-specific adaptive bias and [theta] is the variable smoothness determining the shape (slope) of the sigmoid.”
Mathematically, (a - Beta) / Theta is also equivalent to (a *1/ Theta) – (Beta / Theta). This means that the (a* 1/ Theta) is the initial transformed activation input. Adding the input shift constant corresponds to (-Beta/Theta).).

Regarding claim 8,
Trentin teaches transforming the initial activation output using one or more output transformation constants to generate a transformed activation output comprises: generating a shifted initial activation output by adding an output shift constant to the activation output (Page 4 Section 2, “An assumption is made that the activation function associated with the i-th unit in layer Ll can be either in the form 
    PNG
    media_image1.png
    47
    411
    media_image1.png
    Greyscale

which could be the case of sigmoids or Gaussians with learnable amplitude λ_i,l and an offset (shift)”
Trentin’s Equation 3 (f = λ* ψ  + σ) discloses the transformation of an initial activation output using both an output scale constant (λ) and an output shift constant (σ). While the equation is written in a ‘multiply then add’ form, it is algebraically equivalent to the sequence where a shift is applied first and then scaled (λ (ψ  + σ/ λ)).

Regarding claim 9, 
Trentin teaches transforming the initial activation output using one or more output transformation constants to generate a transformed activation output comprises: generating the transformed activation output by multiplying the shifted initial activation output by an output scale constant (Page 4 Section 2, “An assumption is made that the activation function associated with the i-th unit in layer Ll can be either in the form 
    PNG
    media_image1.png
    47
    411
    media_image1.png
    Greyscale

which could be the case of sigmoids or Gaussians with learnable amplitude λ_i,l and an offset (shift)”
Trentin’s Equation 3 (f = λ* ψ  + σ) discloses the transformation of an initial activation output using both an output scale constant (λ) and an output shift constant (σ). While the equation is written in a ‘multiply then add’ form, it is algebraically equivalent to the sequence where a shift is applied first and then scaled (λ (ψ  + σ/ λ)).

Regarding claim 10,
Trentin teaches the activation function is a smooth activation function and wherein the output scale constant is based on a value of the shifted initial activation output for an element that has a value sampled from a noise distribution (See Equation 3 on Page 4 Section 2, 

    PNG
    media_image1.png
    47
    411
    media_image1.png
    Greyscale
, Page 15 Section 3.2.1, “Eq. (53) defines a Gaussian kernel Ki(x; m) over the transformed, variable metrics given by the weights wi, j,1 of the first layer… Sigmoidal activation functions are used in the second hidden layer (L2):

    PNG
    media_image4.png
    51
    418
    media_image4.png
    Greyscale
where bi,2 denotes a unit-specific adaptive bias and u is the variable smoothness determining the shape (slope) of the sigmoid.”, Page 1 Introduction, “…weights are randomly initialized in a uniform manner over a small interval (as is common practice) and a common learning rate is used for a certain number of training steps”, Page 12 First Paragraph, “To reach such a result, random initialization of individual amplitudes was necessary, as commonly done for the weights…”
Trentin discloses ‘smooth activation function’ in Equation 54, explicitly identifying the parameter θ as the ‘smoothness’ of the sigmoid. The ‘output scale constant’ (amplitude λ in Eq. 3) is based on a value ‘sampled from a noise distribution’ because Trentin teaches that individual amplitudes must undergo ‘random initialization… in a uniform manner over a small interval’ prior to training. Furthermore, Equation 3 (f =  λ ψ  + σ) is algebraically equivalent to λ (ψ  + σ/ λ), wherein the scale constant (λ) is applied to a shifted initial activation output (the sum of the initial output ψ and the normalized shift of + σ/ λ).).

Regarding claim 16,
Trentin teaches the one or more input transformation constants and the one or more output transformation constants are also based on a hyperparameter that represents a degree of nonlinearity of the operations performed by the neural network at initialization (Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):
    PNG
    media_image7.png
    51
    409
    media_image7.png
    Greyscale

where bi,2 denotes a unit-specifc adaptive bias and u is the variable smoothness determining the shape (slope) of the sigmoid.”, Page 1 Introduction, “…weights are randomly initialized in a uniform manner… and a common learning rate is used for a certain number of training steps”
Trentin teaches this by defining the transformation constant θ as the “variable smoothness” that determines the slope and shape of the activation function. This smoothness parameter functions as a degree of nonlinearity, as it dictates how the function transitions from its linear to nonlinear regions at initialization. Additionally, this parameter is applied to a sigmoid non-linear activation function, thus making the result of the function non-linear.).

Regarding claim 17,
Trentin teaches the one or more input transformation constants and the one or more output transformation constants are based on the hyperparameter and an estimate of a maximal slope function of the neural network at initialization (Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):
    PNG
    media_image7.png
    51
    409
    media_image7.png
    Greyscale

where bi,2 denotes a unit-specifc adaptive bias and u is the variable smoothness determining the shape (slope) of the sigmoid.”, Page 12 First Paragraph, “To reach such a result, random initialization of individual amplitudes was necessary, as commonly done for the weights…”, Page 5 Equation 14 and 15 “
    PNG
    media_image8.png
    203
    431
    media_image8.png
    Greyscale
”
Trentin teaches this because the smoothness θ and amplitude λ – are both established at initialization which are used to collectively determine the function’s slope. These constants define the maximal slope function because the maximum derivative of the activation function is dictated by these initialized values at the start of training.).

Regarding claim 23,
Trentin teaches the network input is received during training of the neural network and wherein the method further comprises: obtaining a target network output for the network input (Page 4 Section 2 of Trentin, “Individual layers are denoted by L0, L1, …, LL, where L0 is the input… and L1 is the output layer… The actual input to the latter unit is denoted by ai;l, and the corresponding output is oi;l.”);
 determining a gradient with respect to a set of parameters of the neural network of a loss function for the training of the neural network that measures a quality of the network output relative to the target network output (See Page 5 Equation 8 of Trentin, 

    PNG
    media_image9.png
    59
    416
    media_image9.png
    Greyscale

Equation 8 takes the partial derivative (gradient) of the loss function C defined in Equation 5 with respect to the amplitude.); 
and determining an update to the parameters of the neural network based at least on the gradient (See Equation 6 and 7 of Trentin on Page 4, 

    PNG
    media_image10.png
    49
    412
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    59
    416
    media_image11.png
    Greyscale

Equation 6 defines the update of the new value of the amplitude and Equation 7 shows that this update is calculated by multiplying the gradient by the learning rate.).

Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Trentin (“Networks with trainable amplitude of activation functions”, 2001) in view of Hayou (“On the Impact of the Activation Function on Deep Neural Networks Training”, 2019).

Regarding claim 6,
Trentin does not teach the activation function is a leaky RELU activation function that, for a given element, (i) is an identity operation when the given element is greater than or equal to zero and (ii) multiplies the given element by a slope value when the given element is less than zero, and wherein an output scale constant is defined by the slope value.
Hayou, in the same field of endeavor, teaches the activation function is a leaky RELU activation function that, for a given element, (i) is an identity operation when the given element is greater than or equal to zero and (ii) multiplies the given element by a slope value when the given element is less than zero, and wherein an output scale constant is defined by the slope value (See Definition 3 on Page 4 of Hayou, 

    PNG
    media_image12.png
    148
    361
    media_image12.png
    Greyscale

Hayou teaches ReLU-like activation functions in which the activation magnitude is controlled by slope parameters (leaky ReLU). These slope parameters modulate the amplitude of neuron outputs and directly affect signal propagation and training stability).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s teachings with Hayou’s ReLU-like activation functions in order to extend Trentin’s amplitude-control framework to a leaky ReLU to yield more predictable results in training stability and gradient flow.
Regarding claim 7,
Trentin does not teach the output scale constant is equal to a square root of a ratio between (i) 2 and (ii) a sum of one and a square of the slope value. 
Hayou, in the same field of endeavor, teaches the output scale constant is equal to a square root of a ratio between (i) 2 and (ii) a sum of one and a square of the slope value (“λ, β defined as above. Then f 0 l does not depend on l, and f 0 l (1) = 1 and q l bounded holds if and only if (σb, σw) = 
    PNG
    media_image13.png
    31
    96
    media_image13.png
    Greyscale
”
Hayou teaches that for ReLU-like activation functions with slope B, a normalization constant equal to sqrt(2 / (1 + B^2)) is required to preserve signal variance and maintain stable propagation across layers. This normalization constant functionally scales the activation output and corresponds to the output scale constant.).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s teachings with Hayou’s ReLU-like activation functions that’s associated with normalization scaling in order to extend Trentin’s amplitude-control framework to leak ReLU activations using analytically derived output scaling constants in order to yield more predictable results in training stability and variance preservation across layers during network training.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Trentin (“Networks with trainable amplitude of activation functions”, 2001) in view of Hoffer (“Norm matters: efficient and accurate normalization schemes in deep networks”, 2018).

Regarding claim 11,
Trentin teaches the plurality of neural network layers comprise one or more… summation layers, and wherein processing the network input comprises, for each… summation layer: receiving for each of a plurality of neural network layers that are connected to the… summation layer by an incoming edge in the directed graph, a respective layer output generated by the neural network layer during the processing of the network input; and generating a layer output for the… summation layer by summing the respective weighted layer outputs (Page 4 Section 2, “A feedforward network with L layers is considered. Individual layers are denoted by L0, L1,…, LL, where L0 is the input layer (which is not counted, since its units act only as placeholders) and L1 is the output layer… The actual input to the latter unit is denoted by ai;l, and the corresponding output is oi;l. With this notation we can write

    PNG
    media_image14.png
    46
    368
    media_image14.png
    Greyscale

…where, for notational convenience, the sum over j [ Ll is meant to be extended to all the indexes of units belonging to layer Ll.”
Trentin teaches neural network layers that receive outputs from a plurality of upstream layers and generate a layer output by summing those inputs, as shown by the formulation in which each unit’s input is computed as a sum over outputs of units in the preceding layer. This summation over incoming edges corresponds to the summation layer that receives respective layer outputs from connected neural network layers and generates a layer output.).
Trentin does not teach normalized summation layer… applying a respective normalized weight to each of the respective layer outputs to generate a respective weighted layer output and wherein a sum of the squares of the respective normalized weights is equal to one.
Hoffer, in the same field of endeavor, teaches normalized summation layer (Page 7 Section 5.2, “Weight-norm successfully normalized each output channel’s weights to reside on the L 2 sphere.”
This reference teaches a normalized summation layer because each output channel of the neural network layer applies weights that are explicitly L2-normalized, such that the layer’s output is computed using normalized weights.)
 …applying a respective normalized weight to each of the respective layer outputs to generate a respective weighted layer output (Page 7 Section 5.2, “We return to the original parametrization suggested for weight norm, for a given initialized weight matrix V with N output channels: 

    PNG
    media_image15.png
    49
    114
    media_image15.png
    Greyscale
 
where wi is a parameterized weight for the ith output channel, composed from an L 2 normalized vector vi and scalar gi. Weight-norm successfully normalized each output channel’s weights…”
This reference teaches applying a normalized weight to each respective layer output because each output channel uses a weight vector that is L2-normalized prior to generating the layer output.),
wherein a sum of the squares of the respective normalized weights is equal to one (Page 7 Section 5.2, “composed from an L 2 normalized vector vi… weights to reside on the L 2 sphere.”);
Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s summation-based neural network layers with Hoffer’s normalized weight formulation in order to improve training stability and numerical behavior of the summation operation for more predictable performance improvements (Introduction of Hoffer). 

Claims 12-15 are rejected under 35 U.S.C. 103 as being unpatentable over Trentin (“Networks with trainable amplitude of activation functions”, 2001) in view of Zagoruyko (“DIRACNETS: TRAINING VERY DEEP NEURAL NET”, 2018).

Regarding claim 12,
Trentin teaches for one or more of the plurality of transformed activation function layers, generating, from the layer input, an activation input to an element- wise activation function for the transformed activation layer comprises: (See Equation 1 on Page 4,
    PNG
    media_image3.png
    65
    419
    media_image3.png
    Greyscale
, Page 15 Section 3.2.1, “Sigmoidal activation functions are used in the second hidden layer (L2):

    PNG
    media_image4.png
    51
    418
    media_image4.png
    Greyscale

where b_i,2 denotes a unit-specific adaptive bias and [theta] is the variable smoothness determining the shape (slope) of the sigmoid.”
Equation 1 describes the generation of the initial unit input (a_i,l) through a weighted sum of outputs from the previous layer. Layer L1 finishes its work and hands the data to Layer L2 as a weighted sum (a_i,2) which is used as input for Equation 54. Equation 54 further refines this by showing how input is processed (using bias and smoothness) to specifically generate the final value that enters the exponential/sigmoidal activation function. Equation 54 declares the “activation input” from (a_i,2 – B_i,2) that the sigmoid (element-wise activation function) will process.).
Trentin does not teach computing a convolution between a filter bank tensor for the layer and the layer input.
Zagoruyko, in the same field of endeavor, teaches computing a convolution between a filter bank tensor for the layer and the layer input (Page 2 of Dirac Parameterization of Zagoruyko, “Inspired from ResNet, we parameterize weights as a residual of Dirac function, instead of adding explicit skip connection. Because convolving any input with Dirac results in the same input, this helps propagate information deeper in the network… We generalize this operator to the case of a convolutional layer… convolved with weight Wˆ ∈ RM,M,K1,K2,...,KL (combining M filters1 ) to produce an output y of M channels…

    PNG
    media_image16.png
    55
    354
    media_image16.png
    Greyscale
”
Zagoruyko teaches this by defining a weight tensor W^ that combines M individual filters, which corresponds to the filter bank tensor. Zagoruyko explicitly discloses computing a convolution between this filter bank and the layer input x to generate the output y, as seen in Equation 3.)
Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s summation-based neural network layers with Zagoruyko’s teaching of computing the activation input using a convolution between a filter bank tensor and the layer input in order to enable efficient training of deeper networks and improved propagation (Introduction of Zagoruyko).

Regarding claim 13, 
Trentin teaches for each of the one or more transformed activation function layers (Page 4 of Section 2, “…with the i-th unit in layer Ll can be either in the form

    PNG
    media_image17.png
    36
    365
    media_image17.png
    Greyscale
”),
Trentin does not teach prior to training the neural network, initializing the filter bank tensor for the layer using Delta initialization. 
Zagoruyko, in the same field of endeavor, teaches prior to training the neural network, initializing the filter bank tensor for the layer using Delta initialization. (Page 2 of Dirac Parameterization, “Inspired from ResNet, we parameterize weights as a residual of Dirac function, instead of adding explicit skip connection…D

    PNG
    media_image18.png
    56
    510
    media_image18.png
    Greyscale


    PNG
    media_image19.png
    30
    369
    media_image19.png
    Greyscale
”
Zagoruyko teaches this limitation by defining the weight ‘W’ as a combination of a learned weight and a Dirac delta operator I, which corresponds to the Delta initialization. By setting the scaling vector a to 1 at the start of training, the filter bank is initialized so that the Dirac component dominates so that the identity of the input is preserved as it propagates through the layer.)
Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s summation-based neural network layers with Zagoruyko’s teaching of using Delta initialization in transformed activation functions in order to enable stable training of deeper networks (Introduction of Zagoruyko).

Regarding claim 14,
Trentin does not teach the Delta initialization using an entry-wise Gaussian distribution.
Zagoruyko, in the same field of endeavor, teaches the Delta initialization uses an entry-wise Gaussian distribution (Page 2 and 3 of Dirac Parameterization, “We initialize W from normal distribution N (0, 1).

    PNG
    media_image19.png
    30
    369
    media_image19.png
    Greyscale
”
Zagoruyko teaches this by explicitly stating that the weight component ‘W’ is initialized from a normal distribution, which is synonymous with an entry-wise Gaussian distribution. Equation 4 shows that the resulting filter bank is the product of the Delta initialization that incorporates Gaussian values.).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s summation-based neural network layers with Zagoruyko’s teaching of Delta initialization in transformed activation functions in order to improve signal propagation and enable stable training of deeper neural networks (Introduction of Zagoruyko).

Regarding claim 15,
Trentin does not teach the Delta initialization uses a scaled-corrected uniform orthogonal (SUO) distribution.
Zagoruyko, in the same field of endeavor, teaches the Delta initialization uses a scaled-corrected uniform orthogonal (SUO) distribution (Page 6 Under Figure 3 Caption, “Additionally, we tried to use the same orthogonal initialization as for DiracNet and vary it’s scaling, in which case the range of the scaling gain is even wider.”,  See Equation 4 on Page 2, 
    PNG
    media_image20.png
    29
    383
    media_image20.png
    Greyscale
, See Equation 5 on Page 3,

    PNG
    media_image21.png
    35
    411
    media_image21.png
    Greyscale

Zagoruyko teaches the orthogonal distribution by explicitly disclosing the use of “orthogonal initialization” for the weights of the DiracNet. These weights are then scaled-corrected via the learned vectors a and b in Equation 5 to balance the Dirac and normalized components.).
	Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s summation-based neural network layers with Zagoruyko’s teaching of Delta initialization using a scaled-corrected uniform orthogonal distribution in order to control weight magnitude and enable stable training of neural networks (Introduction of Zagoruyko).

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Trentin (“Networks with trainable amplitude of activation functions”, 2001) in view of Inoue (US 5402519 A).

Regarding claim 22,
Trentin does not teach processing the network input using the neural network comprises: applying normalization to the network input to generate a normalized input; and providing the normalized input as a layer input for one or more initial neural networks of the neural network.
Inoue, in the same field of endeavor, teaches processing the network input using the neural network comprises: applying normalization to the network input to generate a normalized input; and providing the normalized input as a layer input for one or more initial neural networks of the neural network (Paragraph 34 of Inoue, “As a result, a plurality of input values x.sub.i (i=1 to n) to neurons of an input layer are normalized in a predetermined range, so that the computation results and synapse weights in neurons from input to output layers are set to a predetermined range.”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Trentin’s neural network with Inoue’s teaching of normalizing network input prior to processing in order to stabilize computation and improve the convergence of learning (Paragraph 16 of Inoue).
Claims 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over Trentin (“Networks with trainable amplitude of activation functions”, 2001) in view of Inoue (US 5402519 A) and Parhi (“The Role of Neural Network Activation Functions”, 2020).

Regarding claim 24,
Trentin does not teach a system comprising: one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations.
Inoue, in the same field of endeavor, teaches a system comprising: one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising (Paragraph 109 of Inoue, “This system includes a neural network section 5 having a control section 8 for computation or control operations in accordance with a program and a memory 10' for storing a program and data, a man-machine unit 27 for displaying the result of processing and a designation input by an operator, an input unit 25 for inputting input data or a teacher data for learning, and an output unit 26 for outputting recalled data.”):
Therefore, it would have been obvious to one of ordinary skill in the art to modify Trentin’s computer-executable neural network training and activation-function learning methods so that these methods are within the computer system architecture taught by Inoue, in order to perform the neural network operations in a practical and deployable system (Paragraph 4 of Inoue).
Trentin and Inoue do not teach applying the element-wise activation function to the transformed activation input to generate an initial activation output.
Parhi, in the same field of endeavor, teaches applying the element-wise activation function to the transformed activation input to generate an initial activation output (Page 1 Introduction, “In this paper we show how regularization in the finite dimensional space of neural network parameters is actually the same as regularization in the infinite-dimensional space of functions….

    PNG
    media_image5.png
    68
    427
    media_image5.png
    Greyscale

where ρ : R → R is a fixed activation function, K is the width of the network, for k = 1,...,K, vk, wk ∈ R, wk w_k= 0 are the weights and bk ∈ R are the first layer biases, and c(·) is a “generalized bias”1 term in the last layer.”
In Parhi, the term w_k*x – b_k corresponds to the transformed activation input, the function p(*) corresponds to the element-wise activation function, and the resulting value p(w_k*x – b_k) corresponds to the initial activation output for each neuron in the network.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Trentin and Inoue’s teaching with Parhi’s use of element-wise activation functions applied to transformed activation inputs in order to enhance the network’s learning and prediction capability (Introduction of Parhi).
The remainder of claim 24 is an apparatus claim that recites identical limitations to claim 1. Therefore, claim 24 is rejected using the same rationale as claim 1.

Regarding claim 25,
Trentin does not teach one or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations.
Inoue, in the same field of endeavor, teaches one or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising (Paragraph 109 of Inoue, “This system includes a neural network section 5 having a control section 8 for computation or control operations in accordance with a program and a memory 10' for storing a program and data, a man-machine unit 27 for displaying the result of processing and a designation input by an operator, an input unit 25 for inputting input data or a teacher data for learning, and an output unit 26 for outputting recalled data.”):
Therefore, it would have been obvious to one of ordinary skill in the art to modify Trentin’s computer-executable neural network training and activation-function learning methods so that these methods are within the computer system architecture that stores computer readable media instructions taught by Inoue, in order to perform the neural network operations in a practical and deployable system (Paragraph 4 of Inoue).
Trentin and Inoue do not teach applying the element-wise activation function to the transformed activation input to generate an initial activation output.
Parhi, in the same field of endeavor, teaches applying the element-wise activation function to the transformed activation input to generate an initial activation output (Page 1 Introduction, “In this paper we show how regularization in the finite dimensional space of neural network parameters is actually the same as regularization in the infinite-dimensional space of functions….

    PNG
    media_image5.png
    68
    427
    media_image5.png
    Greyscale

where ρ : R → R is a fixed activation function, K is the width of the network, for k = 1,...,K, vk, wk ∈ R, wk w_k= 0 are the weights and bk ∈ R are the first layer biases, and c(·) is a “generalized bias”1 term in the last layer.”
In Parhi, the term w_k*x – b_k corresponds to the transformed activation input, the function p(*) corresponds to the element-wise activation function, and the resulting value p(w_k*x – b_k) corresponds to the initial activation output for each neuron in the network.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Trentin and Inoue’s teaching with Parhi’s use of element-wise activation functions applied to transformed activation inputs in order to enhance the network’s learning and prediction capability (Introduction of Parhi).
The remainder of claim 25 is a non-transitory computer readable claim that recites identical limitations to claim 1. Therefore, claim 25 is rejected using the same rationale as claim 1.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAJD MAHER HADDAD whose telephone number is (571)272-2265. The examiner can normally be reached Mon-Friday 8-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.M.H./Examiner, Art Unit 2125           

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
NEURAL NETWORKS WITH TRANSFORMED ACTIVATION FUNCTION LAYERS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

NEURAL NETWORKS WITH TRANSFORMED ACTIVATION FUNCTION LAYERS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email