Office Action Analysis: 18137175 — POLYNOMIAL APPROXIMATION OF DEEP LEARNING MODELS

Office Action

§101 §103 §112
DETAILED ACTION
The action is in response to the original filing on April 20, 2023. Claims 1-25 are pending and have been considered below. Claims 1, 10, 18, 21, and 25 are independent claims.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on April 20, 2023 and September 14, 2023 are being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities:
On page 3, paragraph 12, “plurality of an NPLs” should read “plurality of NPLs”
On page 8, paragraph 37, “RRLthe loss function” should read “the loss function”
On page 8, paragraph 38, “expression (5)” is missing from the specification
On page 13, paragraph 53, “thereby effect a computer-implemented method” should read “thereby affect a computer-implemented method”
On page 18, paragraph 69, “information associated each” should read “information associated with each”
On page 24, paragraph 98, “range  that the” should read “range so that the”
Appropriate correction is required.

Drawings
The drawings are objected to because of the following minor informalities:
In Fig. 1, “COMMUNICATING FABRIC” should read “COMMUNICATION FABRIC”
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
Claims 1, 3, 7, 8, 14, and 24 are objected to because of the following informalities:
In claim 1, “generating a revised neural network by replacing the trained neural” should read “generating a revised neural network by replacing the trained neural network with the replacement layer.”
In claim 3, “the estimating of the range of input values to the NPL comprises” should read “the estimating of the range of input values to the NPL further comprises”
In claim 7, “each of the plurality of an NPLs” should read “each of the plurality of NPLs”
In claim 8, “the performing of the loss processing comprises” should read “the performing of the loss processing further comprises”
In claim 8, “each of the plurality of an NPLs” should read “each of the plurality of NPLs”
In claim 14, “the estimating of the range of input values to the NPL comprises” should read “the estimating of the range of input values to the NPL further comprises”
In claim 24, “the estimating of the range of input values to the NPL comprises” should read “the estimating of the range of input values to the NPL further comprises”

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3, 5-6, 14, 16-17, and 24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

The term the training input range in claim 3 lacks sufficient antecedent basis as there is no prior reference to a training input range made in these claims. For examination purposes this term is interpreted to mean the range of input values.
The term the group in claim 5 lacks sufficient antecedent basis as there is no prior reference to a group made in these claims. For examination purposes this term is interpreted to mean [a] group.
The term the activation function in claim 6 lacks sufficient antecedent basis as there is no prior reference to an activation function made in these claims. For examination purposes this term is interpreted to mean [an] activation function.

Claims 14 and 16-17 recite a computer program product that parallels the method claims of 3 and 5-6, respectively. Therefore, the analysis discussed above with respect to claims 3 and 5-6 also applies to claims 14 and 16-17, respectively. Accordingly, claims 14 and 16-17 are rejected based on substantially the same rationale as set forth above with respect to claims 3 and 5-6, respectively.

Claim 24 recites a method that parallels the method claim of 3. Therefore, the analysis discussed above with respect to claim 3 also applies to claim 24. Accordingly, claim 24 is rejected based on substantially the same rationale as set forth above with respect to claim 3.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
	
Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1:
Step 1 – Claim 1 is directed to a method: A computer-implemented method…

Step 2A, Prong 1 – A judicial exception is recited in this claim as it recites mathematical concepts (see MPEP 2106.04(a)(2)(I)):
performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network… To “perform” loss processing that “minimizes a loss of the neural network” involves calculating the loss of a neural network using a loss function, which is a mathematical concept. Furthermore, to “perform” loss processing that “reduces a range of input values” is to calculate a new range, or minimum and maximum input values to a non-polynomial layer, that is smaller than the previous range of input values. Hence, “performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network…” is a mathematical concept.
estimating a range of input values to the NPL of the trained neural network. To “estimate” a range is to calculate minimum and maximum input values to the NPL. Hence, “estimating a range of input values to the NPL of the trained neural network” is a mathematical concept.
forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL… To “form” a replacement layer that comprises “a polynomial approximation of an operation” is to calculate a polynomial function that approximates similar outputs when compared to “an operation performed by the NPL” given a certain range of inputs. Hence, “forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL” is a mathematical concept.

Step 2A, Prong 2 – The following limitations are additional elements without significantly more than the abstract idea:
performing a training process on a neural network resulting in a trained neural network, the training process comprising: performing a plurality of training iterations on the neural network… “performing a training process on a neural network resulting in a trained neural network” is an attempt to use the trained neural network by merely applying the abstract idea (i.e., performing the loss processing using math between training iterations) without placing any limits on how the training is performed. Further, the limitation omits any details as to how “performing a training process on a neural network” solves a technical problem and instead recites only the idea of a solution or outcome (see MPEP 2106.05(f)). Thus, the limitation represents no more than mere instructions to implement the abstract idea which is equivalent to adding the words “apply it” to the recited judicial exception.
generating a revised neural network by replacing the NPL of the trained neural… “generating” a revised neural network amounts to insignificant extra-solution activity of data outputting that does not add a meaningful limitation to the “computer-implemented method” (see MPEP 2106.05(g)).

Step 2B – These elements are recited at such a high level of generality that they fail to integrate the abstract idea into a practical application, since they provide nothing more than mere instructions to implement an abstract idea on a generic computer (MPEP 2106.05(f) or only amount to data gathering or outputting without significantly more (MPEP 2106.05(g)). These limitations, taken either alone or in combination, fail to provide an inventive concept. Thus, the claim is not patent eligible.

	Claims 2-9 recite limitations which further narrow the abstract ideas of claim 1 by specifying more details of the mathematical concepts that occur:
	Regarding claim 2, this claim further limits the abstract idea of claim 1 to be based on a mathematical concept: and determining corresponding training input values to the NPL. Determining training input values to the NPL involves calculating values that fall within a given input range for the NPL, which is a mathematical concept. Furthermore, specifying wherein the performing of the training process on the neural network comprises training the neural network using a first training dataset, and wherein the estimating of the range of input values to the NPL comprises inputting instances of a second training dataset to the trained neural network is still insignificant extra-solution activity of necessary data gathering (see MPEP 2106.05(g)).
	Regarding claim 3, this claim further limits the abstract idea of claim 2 to be based on a mathematical concept: wherein the estimating of the range of input values to the NPL comprises performing a statistical analysis using the training input range to the NPL. Performing a statistical analysis using the training input range is a mathematical concept.
	Regarding claim 4, this claim further limits the abstract idea of claim 3 to be based on a mathematical concept: determining the polynomial approximation of the operation performed by the NPL, wherein the determining of the polynomial approximation comprises determining a degree of the polynomial approximation based at least in part on the estimated range of input values to the NPL. Determining the “polynomial approximation” is to calculate a polynomial function that approximates similar outputs when compared to “the operation performed by the NPL” given a certain range of inputs, which is a mathematical concept. Furthermore, determining a “degree of the polynomial approximation” involves calculating a degree for the polynomial function that approximates the outputs of “the operation,” which is a mathematical concept.
	Regarding claim 5, specifying wherein the NPL is selected from the group consisting an activation layer, an instance normalization layer, a maximum pooling layer, and a softmax layer in this manner does not overcome the rejection of claim 1 as modifying the NPL does not make the abstract ideas of claim 1 to not be mathematical concepts.
	Regarding claim 6, specifying wherein the polynomial approximation is a polynomial approximation of the activation function in this manner does not overcome the rejection of claim 5 as modifying the polynomial approximation does not make “forming a replacement layer” to no be a mathematical concept.
	Regarding claim 7, specifying wherein the neural network comprises a plurality of NPLs including said NPL in this manner does not overcome the rejection of claim 1 as modifying the neural network does not make the abstract ideas of claim 1 to not be mathematical concepts. Furthermore, this claim further limits the abstract idea of claim 1 to be based on a mathematical concept: and wherein the performing of the loss processing comprises minimizing NPL input values to each of the plurality of an NPLs. Minimizing the “NPL input values” is to calculate the smallest possible range, or minimum and maximum input values for each NPL, that is smaller than the previous range of input values, which is a mathematical concept.
	Regarding claim 8, specifying wherein the performing of the loss processing comprises minimizing NPL input values to each of the plurality of an NPLs during respective separate training iterations in this manner does not overcome the rejection of claim 7 as modifying the minimizing of the NPL input values does not make “performing of the loss processing” to not be a mathematical concept.
	Regarding claim 9, this claim further limits the abstract idea of claim 1 to be based on a mathematical concept: wherein the performing of the loss processing comprises minimizing the loss of the neural network using a loss function that includes a standard loss term and a regularization range loss term. Using “a loss function” is to use a mathematical formula, which is a mathematical concept.

Regarding claim 10:
Step 1 – Claim 10 is directed to a product: A computer program product…

Step 2A, Prong 1 – A judicial exception is recited in this claim as it recites mathematical concepts (see MPEP 2106.04(a)(2)(I)):
performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network… To “perform” loss processing that “minimizes a loss of the neural network” involves calculating the loss of a neural network using a loss function, which is a mathematical concept. Furthermore, to “perform” loss processing that “reduces a range of input values” is to calculate a new range, or minimum and maximum input values to a non-polynomial layer, that is smaller than the previous range of input values. Hence, “performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network…” is a mathematical concept.
estimating a range of input values to the NPL of the trained neural network. To “estimate” a range is to calculate minimum and maximum input values to the NPL. Hence, “estimating a range of input values to the NPL of the trained neural network” is a mathematical concept.
forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL… To “form” a replacement layer that comprises “a polynomial approximation of an operation” is to calculate a polynomial function that approximates similar outputs when compared to “an operation performed by the NPL” given a certain range of inputs. Hence, “forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL” is a mathematical concept.

Step 2A, Prong 2 – The following limitations are additional elements without significantly more than the abstract idea:
one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by a processor to cause the processor to perform operations comprising… one or more computer readable storage media, program instructions, and a processor used as mere tools to apply an exception are generic elements for performing or applying the abstract idea using a generic computing environment (see MPEP 2106.05(f)).
performing a training process on a neural network resulting in a trained neural network, the training process comprising: performing a plurality of training iterations on the neural network… “performing a training process on a neural network resulting in a trained neural network” is an attempt to use the trained neural network by merely applying the abstract idea (i.e., performing the loss processing using math between training iterations) without placing any limits on how the training is performed. Further, the limitation omits any details as to how “performing a training process on a neural network” solves a technical problem and instead recites only the idea of a solution or outcome (see MPEP 2106.05(f)). Thus, the limitation represents no more than mere instructions to implement the abstract idea which is equivalent to adding the words “apply it” to the recited judicial exception.
generating a revised neural network by replacing the NPL of the trained neural network by replacing the NPL of the trained neural network with the replacement layer… “generating” a revised neural network amounts to insignificant extra-solution activity of data outputting that does not add a meaningful limitation to the “computer-implemented method” (see MPEP 2106.05(g)).

Step 2B – These elements are recited at such a high level of generality that they fail to integrate the abstract idea into a practical application, since they provide nothing more than mere instructions to implement an abstract idea on a generic computer (MPEP 2106.05(f) or only amount to data gathering or outputting without significantly more (MPEP 2106.05(g)). These limitations, taken either alone or in combination, fail to provide an inventive concept. Thus, the claim is not patent eligible.

	Claims 11 and 12 recite limitations which further narrow the abstract ideas of claim 10 by specifying more details of the mathematical concepts that occur:
	Regarding claim 11, specifying wherein the stored program instructions are stored in a computer readable storage device in a data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system in this manner does not overcome the rejection of claim 10 as modifying the stored program instructions does not make the abstract ideas of claim 10 to not be a mathematical concept.
	Regarding claim 12, specifying wherein the stored program instructions are stored in a computer readable storage device in a server data processing system, and wherein the stored program instructions are downloaded in response to a request over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system, further comprising :program instructions to meter use of the program instructions associated with the request; and program instructions to generate an invoice based on the metered use in this manner does not overcome the rejection of claim 10 as modifying the stored program instructions does not make the abstract ideas of claim 10 to not be a mathematical concept.

Claims 13-17 recite a computer program product that parallels the method claims of 2-6, respectively. Therefore, the analysis discussed above with respect to claims 2-6 also applies to claims 13-17, respectively. Accordingly, claims 13-17 are rejected based on substantially the same rationale as set forth above with respect to claims 2-6, respectively.

Regarding claim 18:
Step 1 – Claim 18 is directed to a system: A computer system…

Step 2A, Prong 1 – A judicial exception is recited in this claim as it recites mathematical concepts (see MPEP 2106.04(a)(2)(I)):
performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network… To “perform” loss processing that “minimizes a loss of the neural network” involves calculating the loss of a neural network using a loss function, which is a mathematical concept. Furthermore, to “perform” loss processing that “reduces a range of input values” is to calculate a new range, or minimum and maximum input values to a non-polynomial layer, that is smaller than the previous range of input values. Hence, “performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network…” is a mathematical concept.
estimating a range of input values to the NPL of the trained neural network. To “estimate” a range is to calculate minimum and maximum input values to the NPL. Hence, “estimating a range of input values to the NPL of the trained neural network” is a mathematical concept.
forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL… To “form” a replacement layer that comprises “a polynomial approximation of an operation” is to calculate a polynomial function that approximates similar outputs when compared to “an operation performed by the NPL” given a certain range of inputs. Hence, “forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL” is a mathematical concept.

Step 2A, Prong 2 – The following limitations are additional elements without significantly more than the abstract idea:
a processor and one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by the processor to cause the processor to perform operations comprising… a processor, one or more computer readable storage media, and program instructions used as mere tools to apply an exception are generic elements for performing or applying the abstract idea using a generic computing environment (see MPEP 2106.05(f)).
performing a training process on a neural network resulting in a trained neural network, the training process comprising: performing a plurality of training iterations on the neural network… “performing a training process on a neural network resulting in a trained neural network” is an attempt to use the trained neural network by merely applying the abstract idea (i.e., performing the loss processing using math between training iterations) without placing any limits on how the training is performed. Further, the limitation omits any details as to how “performing a training process on a neural network” solves a technical problem and instead recites only the idea of a solution or outcome (see MPEP 2106.05(f)). Thus, the limitation represents no more than mere instructions to implement the abstract idea which is equivalent to adding the words “apply it” to the recited judicial exception.
generating a revised neural network by replacing the NPL of the trained neural network by replacing the NPL of the trained neural network with the replacement layer… “generating” a revised neural network amounts to insignificant extra-solution activity of data outputting that does not add a meaningful limitation to the “computer-implemented method” (see MPEP 2106.05(g)).

Step 2B – These elements are recited at such a high level of generality that they fail to integrate the abstract idea into a practical application, since they provide nothing more than mere instructions to implement an abstract idea on a generic computer (MPEP 2106.05(f) or only amount to data gathering or outputting without significantly more (MPEP 2106.05(g)). These limitations, taken either alone or in combination, fail to provide an inventive concept. Thus, the claim is not patent eligible.

Claims 19 and 20 recite limitations which further narrow the abstract ideas of claim 18 by specifying more details of the mathematical concepts that occur:
Regarding claim 19, specifying wherein the neural network comprises a first layer and a second layer, wherein the second layer is the NPL in this manner does not overcome the rejection of claim 18 as modifying the neural network does not make the abstract ideas of claim 18 to not be a mathematical concept. Furthermore, this claim further limits the abstract idea of claim 18 to be based on a mathematical concept: by applying a first weight value to an output of the first layer. To “apply” a weight is to multiply an output by the first weight value, which is a mathematical concept. Finally, and wherein, for an iteration in the plurality of training iterations: the neural network generates a weighted output value… and the NPL receives the weighted output value as an input value in the range of input values is still insignificant extra-solution activity of necessary data gathering and outputting (see MPEP 2106.05(g)).
Regarding claim 20, this claim further limits the abstract idea of claim 19 to be based on a mathematical concept: wherein the loss processing comprises adjusting the first weight value. To “adjust” a weight value involves performing an optimization algorithm (e.g., gradient descent), which is a mathematical concept.

Regarding claim 21:
Step 1 – Claim 21 is directed to a method: A computer-implemented method comprising…

Step 2A, Prong 1: A judicial exception is recited in this claim as it recites mathematical concepts (see MPEP 2106.04(a)(2)(I)):
by using a first weight value to generate an output of the first layer… To “use” a weight is to multiply an input of the first layer by the first weight value to generate an output, which is a mathematical concept.
performing, between a selected training iteration and another training iteration after the selected training iteration, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to the NPL of the neural network… To “perform” loss processing that “minimizes a loss of the neural network” involves calculating the loss of a neural network using a loss function, which is a mathematical concept. Furthermore, to “perform” loss processing that “reduces a range of input values” is to calculate a new range, or minimum and maximum input values to a non-polynomial layer, that is smaller than the previous range of input values. Hence, “performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network…” is a mathematical concept.
estimating a range of input values to the NPL of the trained neural network. To “estimate” a range is to calculate minimum and maximum input values to the NPL. Hence, “estimating a range of input values to the NPL of the trained neural network” is a mathematical concept.
forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL… To “form” a replacement layer that comprises “a polynomial approximation of an operation” is to calculate a polynomial function that approximates similar outputs when compared to “an operation performed by the NPL” given a certain range of inputs. Hence, “forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL” is a mathematical concept.

Step 2A, Prong 2 – The following limitations are additional elements without significantly more than the abstract idea:
performing, on a neural network comprising at least a first layer and a second layer, wherein the second layer is a non-polynomial layer (NPL), a training process on a neural network resulting in a trained neural network, the training process comprising: performing a plurality of training iterations on the neural network… “performing… a training process on a neural network resulting in a trained neural network” is an attempt to use the trained neural network by merely applying the abstract idea (i.e., performing the loss processing using math between training iterations) without placing any limits on how the training is performed. Further, the limitation omits any details as to how “performing a training process on a neural network” solves a technical problem and instead recites only the idea of a solution or outcome (see MPEP 2106.05(f)). Thus, the limitation represents no more than mere instructions to implement the abstract idea which is equivalent to adding the words “apply it” to the recited judicial exception.
wherein in a certain training iteration a weighted output value is generated… and the weighted output value is sent as an input value in the NPL… generating “a weighted output value” and sending it as an input value amounts to insignificant extra-solution activity of necessary data gathering and outputting (see MPEP 2106.05(g)).
generating a revised neural network by replacing the NPL of the trained neural network with the replacement layer… “generating” a revised neural network amounts to insignificant extra-solution activity of data outputting that does not add a meaningful limitation to the “computer-implemented method” (see MPEP 2106.05(g)).

Step 2B – These elements are recited at such a high level of generality that they fail to integrate the abstract idea into a practical application, since they provide nothing more than mere instructions to implement an abstract idea on a generic computer (MPEP 2106.05(f) or only amount to data gathering or outputting without significantly more (MPEP 2106.05(g)). These limitations, taken either alone or in combination, fail to provide an inventive concept. Thus, the claim is not patent eligible.

Claims 22-24 recite a computer-implemented method that parallels the system claim of 20 and the method claims of 2 and 3, respectively. Therefore, the analysis discussed above with respect to claims 20, 2, and 3 also applies to claims 22-24, respectively. Accordingly, claims 22-24 are rejected based on substantially the same rationale as set forth above with respect to claims 20, 2, and 3, respectively.

Regarding claim 25:
Step 1 – Claim 25 is directed to a product: A computer program product…

Step 2A, Prong 1: A judicial exception is recited in this claim as it recites mathematical concepts (see MPEP 2106.04(a)(2)(I)):
by using a first weight value to generate an output of the first layer… To “use” a weight is to multiply an input of the first layer by the first weight value to generate an output, which is a mathematical concept.
performing, between a selected training iteration and another training iteration after the selected training iteration, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to the NPL of the neural network… To “perform” loss processing that “minimizes a loss of the neural network” involves calculating the loss of a neural network using a loss function, which is a mathematical concept. Furthermore, to “perform” loss processing that “reduces a range of input values” is to calculate a new range, or minimum and maximum input values to a non-polynomial layer, that is smaller than the previous range of input values. Hence, “performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network…” is a mathematical concept.
estimating a range of input values to the NPL of the trained neural network. To “estimate” a range is to calculate minimum and maximum input values to the NPL. Hence, “estimating a range of input values to the NPL of the trained neural network” is a mathematical concept.
forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL… To “form” a replacement layer that comprises “a polynomial approximation of an operation” is to calculate a polynomial function that approximates similar outputs when compared to “an operation performed by the NPL” given a certain range of inputs. Hence, “forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL” is a mathematical concept.

Step 2A, Prong 2 – The following limitations are additional elements without significantly more than the abstract idea:
one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by a processor to cause the processor to perform operations comprising… one or more computer readable storage media, program instructions, and a processor used as mere tools to apply an exception are generic elements for performing or applying the abstract idea using a generic computing environment (see MPEP 2106.05(f)).
performing, on a neural network comprising at least a first layer and a second layer, wherein the second layer is a non-polynomial layer (NPL), a training process on a neural network resulting in a trained neural network, the training process comprising: performing a plurality of training iterations on the neural network… “performing… a training process on a neural network resulting in a trained neural network” is an attempt to use the trained neural network by merely applying the abstract idea (i.e., performing the loss processing using math between training iterations) without placing any limits on how the training is performed. Further, the limitation omits any details as to how “performing a training process on a neural network” solves a technical problem and instead recites only the idea of a solution or outcome (see MPEP 2106.05(f)). Thus, the limitation represents no more than mere instructions to implement the abstract idea which is equivalent to adding the words “apply it” to the recited judicial exception.
wherein in a certain training iteration a weighted output value is generated… and the weighted output value is sent as an input value in the NPL… generating “a weighted output value” and sending it as an input value amounts to insignificant extra-solution activity of necessary data gathering and outputting (see MPEP 2106.05(g)).
generating a revised neural network by replacing the NPL of the trained neural network with the replacement layer… “generating” a revised neural network amounts to insignificant extra-solution activity of data outputting that does not add a meaningful limitation to the “computer-implemented method” (see MPEP 2106.05(g)).

Step 2B – These elements are recited at such a high level of generality that they fail to integrate the abstract idea into a practical application, since they provide nothing more than mere instructions to implement an abstract idea on a generic computer (MPEP 2106.05(f) or only amount to data gathering or outputting without significantly more (MPEP 2106.05(g)). These limitations, taken either alone or in combination, fail to provide an inventive concept. Thus, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
	
Claims 1-4, 7, 10, 13-15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over No et al. (US 20240211738 A1, hereinafter No) in view of Obla (“Effective Activation Functions for Homomorphic Evaluation of Deep Neural Networks,” hereinafter Obla).

	Regarding claim 1, No teaches a computer-implemented method (¶14 “a processor-implemented method”) comprising: performing a training process on a neural network resulting in a trained neural network (Fig. 1 – 10, ¶43 “computing apparatus 10 may be configured to perform a neural network operation of a neural network model. In an example, the computing apparatus 10 may perform training… of a machine leaning model,” wherein resulting in a trained neural network is implicit), the training process comprising: performing a plurality of training iterations on the neural network (Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88 “if the high accuracy threshold is not achieved through the set approximation region, the processor 200 may repeat the operation of newly setting the approximation region using the maximum and minimum values in the obtained approximation region. When the predetermined accuracy threshold is achieved, the processor 200 may generate/output the set approximation region for use with the approximate polynomial that replaces a ReLU unit/function of a ReLU layer of the neural network,” wherein to “repeat the operation” comprises performing a plurality of training iterations).
	Regarding the limitation, and performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network, No further teaches and performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network (¶46 “The neural network model may be trained to infer a result from an input by incrementally adjusting weights of the nodes through training… Each of such nodes of the plural layers may also include respective biases that may be determined or set during training,” Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88 “if the high accuracy threshold is not achieved… the processor 200 may repeat the operation of newly setting the approximation region,” wherein to “repeat the operation” in order to achieve “high accuracy” of the neural network encompasses performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network) and a range of values to a non-polynomial layer (NPL) of the neural network (Fig. 3, ¶59 “a maximum value and a minimum value of respective input data to one or more layers (e.g., ReLU layers),” wherein a “maximum value” and a “minimum value” encompasses a range of values). However, No fails to teach and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network.
Obla, in the same field of endeavor, teaches and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network (Page 24 Figure 3.2 depicts polynomial approximations of ReLU, a non-polynomial activation function, Page 24 Table 3.2 depicts distribution ranges for input values to an activation function, Page 24 ¶1 “we analysed the… inputs to the activation function,” Page 25 ¶1 “Depending on the complexity of the dataset, we observe that at least 98% of the data lies between the range [-3, 3], but the ranges can be as large as [-30, 21],” Page 25 ¶4 “our first observation was the significant increase in accuracy for the polynomial approximated between [-7, 7]… the activation layer is able to process more inputs accurately,” Page 26 Table 3.3 depicts the accuracy of using smaller ranges of input values, Page 33 ¶1 “since the polynomial approximation will be accurate only within their approximation range, it is necessary to restrict the range of inputs,” Page 34 Table 4.1 depicts the maximum input ranges recorded for every activation layer; “to restrict the range of inputs” to smaller ranges such as “[-3, 3]” or “[-7, 7]” as opposed to the larger ranges depicted in Tables 3.2 and 4.1 encompasses reduces a range of values to a non-polynomial layer (NPL)).
	No further teaches estimating a range of input values to the NPL of the trained neural network (Fig. 1 – 200, Fig. 3, ¶59 “The processor 200 may calculate a maximum value and a minimum value of respective input data to one or more layers (e.g., ReLU layers) … based on data input to the neural network”).
	No further teaches forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL (Fig. 1 – 200, ¶66 “The processor 200 may be configured to implement the neural network using the target approximate polynomial instead of an original neural network operation… the neural network may have a ReLU layer that uses a typical ReLU unit or function for an insertion of non-linearity… the typical ReLU of the neural network may be replaced (updated) with the target approximate polynomial, and act on input data from another layer of the neural network,” wherein replacing “the typical ReLU of the neural network” used by a “ReLU layer” with an “approximate polynomial” encompasses forming a replacement layer for the NPL).
	No further teaches and generating a revised neural network by replacing the NPL of the trained neural (Fig. 2 – 220, Fig. 3 – 310, ¶76 “the generated neural network 310 may correspond to the trained neural network 220 except that the ReLU layers now apply the respective target approximate polynomials instead of the ReLU units/functions of the ReLU layers in the trained neural network 220”).
	No and Obla are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the iterative training and reducing a range of values of Obla with the methodology of No. The motivation to do so is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52 ¶1).

	Regarding claim 2, No in view of Obla teaches the computer-implemented method of claim 1 (and thus the rejection of claim 1 is incorporated).
	Regarding the limitation wherein the performing of the training process on the neural network comprises training the neural network using a first training dataset, and wherein the estimating of the range of input values to the NPL comprises inputting instances of a second training dataset to the trained neural network and determining corresponding training input values to the NPL, No teaches and wherein the estimating of the range of input values to the NPL comprises inputting instances of a second training dataset to the trained neural network and determining corresponding training input values to the NPL (Fig. 1 – 200, Fig. 2 – 230, 250, Fig. 4 – 410, ¶81 “The processor 200 may calculate maximum and minimum values 250 of the input values of the ReLU function based on the pre-trained deep learning model 220 and the sample 230 of the trained data set”). However, No fails to teach wherein the performing of the training process on the neural network comprises training the neural network using a first training dataset.
	Obla teaches this limitation (Page 42 ¶3 “The model is initially trained using a traditional activation function like Softplus… The training on MNIST using the original activation is conducted,” wherein MNIST, as understood in the art, is a popular dataset of handwritten digits).
No and Obla are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the training process and first training dataset of Obla with the second training dataset and estimating of the range of input values of No. The motivation to do so, as stated by Obla, is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52 ¶1).

	Regarding claim 3, No in view of Obla teaches the computer-implemented method of claim 2 (and thus the rejection of claim 2 is incorporated).
Regarding the limitation wherein the estimating of the range of input values to the NPL comprises performing a statistical analysis using the training input range to the NPL, No teaches the estimating of the range of input values to the NPL (Fig. 3, ¶59, as discussed above with respect to claim 1). However, No fails to teach wherein the estimating of the range… comprises performing a statistical analysis using the training input range to the NPL.
Obla teaches this limitation (Page 24 ¶1 “we analysed the… inputs to the activation function,” Page 24 Table 3.2 depicts standard deviation, or statistical analysis, Page 25 ¶1 “we observe that at least 98% of the data lies between the range [-3, 3],” Page 25 ¶3 “we construct two polynomials of degree 4 between the bounds [-7, 7] and [-25, 25]. In this experiment we also compare the performance with a degree 4 polynomial approximated between [-3, 3],” Page 26 ¶2 “From our experiments in this section, we can conclude that it is necessary to strike a balance between maintaining an acceptable error of approximation between [-3, 3], while covering a range larger than [-3, 3]. This is true, especially for complex datasets that have large input distributions to the activation layer,” Page 26 Table 3.3 depict the results of polynomial approximations using different input ranges such as “[-3, 3]” and “[-7, 7]” including their respective accuracies and approximation errors, which are calculated through performing a statistical analysis using the training input range to the NPL).
No and Obla are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the statistical analysis of Obla with the estimating of the range of input values of No. The motivation to do so, as stated by Obla, is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52 ¶1).

Regarding claim 4, No in view of Obla teaches the computer-implemented method of claim 3 (and thus the rejection of claim 3 is incorporated).
No teaches determining the polynomial approximation of the operation performed by the NPL (Fig. 6, ¶110 “A solid line represents an accurate typical ReLU and a dotted line represents an approximate polynomial approximated in an approximation region”), wherein the determining of the polynomial approximation comprises determining a degree of the polynomial approximation based at least in part on the estimated range of input values to the NPL (Fig. 3, ¶59, as explained above with respect to claim 1, Fig. 1 – 200, Fig. 2 – 220, Fig. 3 – 310, ¶75 “The processor 200 may adjust the approximation region using the obtained minimum and maximum values. Through this operation, the processor 200 may effectively set a target approximation region for an interim approximate polynomial with the same degree (e.g., without having to change a degree of the interim approximate polynomial),” ¶76 “The processor 200 may generate the neural network 310… while using a low polynomial degree, by effectively setting the approximation region of the approximate polynomial based on the values of the input data… the generated neural network 310 may correspond to the trained neural network 220 except that the ReLU layers now apply the respective target approximate polynomials instead of the ReLU units/functions of the ReLU layers in the trained neural network 220”).

Regarding claim 7, No in view of Obla teaches the computer-implemented method of claim 1 (and thus the rejection of claim 1 is incorporated).
Regarding the limitation wherein the neural network comprises a plurality of NPLs including said NPL, and wherein the performing of the loss processing comprises minimizing NPL input values to each of the plurality of an NPLs, No teaches wherein the neural network comprises a plurality of NPLs including said NPL, and wherein the performing of the loss processing comprises adjusting NPL input values to each of the plurality of an NPLs (Fig. 1 – 200, Fig. 3 depicts a plurality of NPLs including said NPL, ¶62 “The processor 200 may set the approximation region for respective approximate polynomials for each of the ReLU layers of the neural network based on the respective maximum value and the respective minimum value for each input to each ReLU layer. In one example, the processor 200 may set the approximation region for respective ReLU layers based on the number of ReLU layers included in the neural network model and the total number of respective input data to the neural network model”). However, No fails to teach minimizing NPL input values to each of the plurality of an NPLs.
Obla teaches this limitation (Page 34 Table 4.1 depicts a plurality of NPLs, each with differing ranges of inputs, Page 34 ¶1 “the range of inputs differs from layer to layer,” Page 34 ¶2 “We can take advantage of this phenomenon by approximating a polynomial for every layer… the approximation for every layer would be able to accept most of the inputs while having the lowest error of approximation… layers with a smaller input range will not be constrained by a single polynomial catering to the largest range observed,” wherein approximating a polynomial for every layer that “would be able to accept most of the inputs,” but not all inputs within the range of inputs, while having “the lowest error of approximation,” implies minimizing NPL input values to each of the plurality of an NPLs).
No and Obla are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the minimizing NPL input values of Obla with the plurality of NPLs of No. The motivation to do so, as stated by Obla, is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52 ¶1).

Regarding claim 10, No teaches a computer program product (¶124 “Instructions or software to control computing hardware… to… perform the methods as described above may be written as computer programs”) comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by a processor to cause the processor to perform operations (Fig. 1 – 200, 300, ¶125 “The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data… may be… stored… in or on one or more non-transitory computer-readable storage media”) comprising: performing a training process on a neural network resulting in a trained neural network (Fig. 1 – 10, ¶43 as described above with respect to claim 1), the training process comprising: performing a plurality of training iterations on the neural network (Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88, as explained above with respect to claim 1).
	Regarding the limitation, and performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network, No further teaches and performing, between the training iterations, loss processing that (i) minimizes a loss of the neural network (¶46, Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88, as described above with respect to claim 1) and a range of values to a non-polynomial layer (NPL) of the neural network (Fig. 3, ¶59 as described above with respect to claim 1). However, No fails to teach and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network.
Obla teaches and (ii) reduces a range of values to a non-polynomial layer (NPL) of the neural network (Page 24 Figure 3.2, Page 24 Table 3.2, Page 24 ¶1, Page 25 ¶1, Page 25 ¶4, Page 26 Table 3.3, Page 33 ¶1, Page 34 Table 4.1 all as explained above with respect to claim 1).
	No further teaches estimating a range of input values to the NPL of the trained neural network (Fig. 1 – 200, Fig. 3, ¶59).
	No further teaches forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL (Fig. 1 – 200, ¶66 as explained above with respect to claim 1).
	No further teaches and generating a revised neural network by replacing the NPL of the trained neural network with the replacement layer (Fig. 2 – 220, Fig. 3 – 310, ¶76).
	No and Obla are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the iterative training and reducing a range of values of Obla with the computer program product of No. The motivation to do so is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52 ¶1).

Claims 13-15 recite a computer program product that parallels the method claims of 2-4, respectively. Therefore, the analysis discussed above with respect to claims 2-4 also applies to claims 13-15, respectively. Accordingly, claims 13-15 are rejected based on substantially the same rationale as set forth above with respect to claims 2-4, respectively.

Claim 18 recites a system that parallels the product claim of 10. Therefore, the analysis discussed above with respect to claim 10 applies to claim 18. Accordingly, claim 18 is rejected based on substantially the same rationale as set forth above with respect to claim 10.

Claims 5-6 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over No in view of Obla and further in view of Zhu et al. (US 20210365710 A1, hereinafter Zhu).

Regarding claim 5, No in view of Obla teaches the computer-implemented method of claim 1 (and thus the rejection of claim 1 is incorporated).
Regarding the limitation wherein the NPL is selected from the group consisting an activation layer, an instance normalization layer, a maximum pooling layer, and a softmax layer, No teaches wherein the NPL is selected from the group consisting an activation layer (Fig. 1 – 200, Fig. 2 – 220, Fig. 3 depicts a plurality of ReLU, or the group consisting an activation layer, Fig. 7 – 730, ¶114 “information is generated and processed through (with) the forward pass of the neural network, when each ReLU layer is reached, the processor 200 may be configured to respectively calculate a maximum value and a minimum value of the corresponding input data generated up to that point of the corresponding ReLU layer… of a neural network”). However, No fails to teach the group consisting an activation layer, an instance normalization layer, a maximum pooling layer, and a softmax layer.
Zhu, in the same field of endeavor, teaches this limitation (Fig. 3, ¶63 “The first layer of each ladder from left to right in FIG. 3 (except the last ladder) is the max pooling layer, the first three layers from left to right in the last ladder in FIG. 3 are full connection layer (fully connected+Relu), the last layer from left to right in the last ladder in FIG. 3 is the activation layer (softmax), and the remaining layers in FIG. 3 are convolution layers (convolution+Relu),” ¶47 “Instance Normalization layer is used to normalize the feature image output from the convolutional layer,” ¶65 “Optionally, the normalizing network includes an Adaptive Instance Normalization (AdaIN) processing layer, and the coding full connection layer is connected to the AdaIN processing layer”).
No and Zhu are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the layers of Zhu with the NPL and group consisting an activation layer of No. The motivation to do so, as stated by Zhu, is to design an iterative training method that “can train a style transfer network model adapted to multiple types of images” (Zhu, ¶105).

Regarding claim 6, No in view of Obla and further in view of Zhu teaches the computer-implemented method of claim 5 (and thus the rejection of claim 5 is incorporated).
No teaches wherein the polynomial approximation is a polynomial approximation of the activation function (Fig. 2 – 220, Fig. 3 – 310, ¶76 “the generated neural network 310 may correspond to the trained neural network 220 except that the ReLU layers now apply the respective target approximate polynomials instead of the ReLU units/functions of the ReLU layers in the trained neural network 220”).

Claims 16-17 recite a computer program product that parallels the method claims of 5-6, respectively. Therefore, the analysis discussed above with respect to claims 5-6 also applies to claims 16-17, respectively. Accordingly, claims 16-17 are rejected based on substantially the same rationale as set forth above with respect to claims 5-6, respectively.

Claims 8-9 and 21-25 are rejected under 35 U.S.C. 103 as being unpatentable over No in view of Obla and further in view of Teig et al (US 12112254 B1, hereinafter Teig).

Regarding claim 8, No in view of Obla teaches the computer-implemented method of claim 7 (and thus the rejection of claim 7 is incorporated).
Regarding the limitation wherein the performing of the loss processing comprises minimizing NPL input values to each of the plurality of an NPLs during respective separate training iterations, No teaches the performing of the loss processing (¶46, as explained above with respect to claim 1). However, No fails to teach wherein the performing of the loss processing comprises minimizing NPL input values to each of the plurality of an NPLs during respective separate training iterations.
Obla teaches wherein the performing of the loss processing comprises minimizing NPL input values to each of the plurality of an NPLs (Page 34 Table 4.1, Page 34 ¶1, Page 34 ¶2, as explained above with respect to claim 7). However, Obla fails to teach during respective separate training iterations.
Teig, in the same field of endeavor, teaches performing loss processing during respective separate training iterations (Col. 7 Lines 43-52, “some embodiments… iteratively trains the MT network by progressively adding data to the inputs used to train the network at each iteration. Between iterations, the hyperparameters are optimized by determining the error of the network as trained from the prior iteration when using a set of validation inputs, and modifying the hyperparameters to decrease this error. The set of validation inputs… are then added to the training inputs for the next iteration”).
No, Obla, and Teig are analogous to the claimed invention as all are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the minimizing of NPL input values of Obla and the respective separate training iterations of Teig with the loss processing and each of the plurality of NPLs of No. The motivation to do so is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52 ¶1) and to produce a training method that “reduces overfitting by preventing the network from learning the noise in the training set” (Teig, Col. 16 Lines 1-3).

Regarding claim 9, No in view of Obla teaches the computer-implemented method of claim 1 (and thus the rejection of claim 1 is incorporated).
Regarding the limitation wherein the performing of the loss processing comprises minimizing the loss of the neural network using a loss function that includes a standard loss term and a regularization range loss term, No teaches wherein the performing of the loss processing comprises minimizing the loss of the neural network (¶46, as explained above with respect to claim 1). However, No fails to teach using a loss function that includes a standard loss term and a regularization range loss term.
Teig teaches this limitation (Col. 7 Lines 17-25 “the training process typically (1) forward propagates the input value set through the network's nodes to produce a computed output value set and then (2) backpropagates a gradient (rate of change) of a loss function (output error) that quantifies in a particular way the difference between the input set's known output value set and the input set's computed output value set, in order to adjust the network's configurable parameters (e.g., the weight values),” Col. 20 Lines 53-54 “One option is to use the standard cross-entropy loss function to train the decoder network,” wherein “standard cross-entropy loss function” encompasses a standard loss term, Col. 3 Line 66-Col. 4 Line 5 “optimize the training of the parameters of a machine-trained (MT) network by optimizing the tuning of a set of hyperparameters that define how the training of the MT network is performed. These hyperparameters, in various embodiments, may include coefficients in the loss function used to train the network (e.g., L1 and L2 regularization parameters),” wherein “L1 and L2 regularization parameters” encompasses a regularization range loss term).
No and Teig are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the loss function including standard and regularization range loss terms of Teig with the loss processing of No. The motivation to do so is to produce a training method that “reduces overfitting by preventing the network from learning the noise in the training set” (Teig, Col. 16 Lines 1-3).

Regarding claim 21, No teaches a computer-implemented method (¶14) comprising: performing, on a neural network comprising at least a first layer and a second layer, wherein the second layer is a non-polynomial layer (NPL) (Fig. 2 – 220, Fig. 3, ¶74 “The ReLU functions may be functions of respective ReLU layers, which may each follow another layer of the neural network 220 that may perform a different neural network operation (e.g., a convolution layer…), as non-limiting examples”), a training process on a neural network resulting in a trained neural network (Fig. 1 – 10, ¶43, as explained above with respect to claim 1).
Regarding the limitation the training process comprising: performing a plurality of training iterations on the neural network, wherein in a certain training iteration a weighted output value is generated by using a first weight value to generate an output of the first layer and the weighted output value is sent as an input value in the NPL, No further teaches the training process comprising: performing a plurality of training iterations on the neural network (Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88, as explained above with respect to claim 1) and the first layer (¶74 “(e.g., a convolution layer…)”) and the NPL (¶74 “respective ReLU layers”). However, No fails to teach wherein in a certain training iteration a weighted output value is generated by using a first weight value to generate an output of the first layer and the weighted output value is sent as an input value in the NPL.
Teig teaches wherein in a certain training iteration a weighted output value is generated by using a first weight value to generate an output of a first layer and the weighted output value is sent as an input value in a second layer (Col. 7 Lines 53-56 “for a particular iteration, a first set of training inputs are used to train the parameters of the MT network (e.g., the weight values for a neural network) using a first set of hyperparameters,” Col. 5 Lines 36-39 “The weight coefficients W(l) are parameters that are adjusted during the network's training in order to configure the network to solve a particular problem,” Fig. 2 – 200-210, Col. 6 Lines 19-21 “FIG. 2 conceptually illustrates a representation of a convolutional layer of a convolutional neural network,” Col. 6 Lines 38-42 “the layer includes six filters 205… Each value in one of the filters is a weight value that is trained… each filter includes 27 trainable weight values,” wherein “a weight value that is trained,” or a first weight value, is implied to be trained after “a particular iteration” or in a certain training iteration in which “inputs are used to train the parameters of the… network (e.g., the weight values for a neural network),” Col. 6 Lines 54-60 “To generate the output activations, each of the filters 205 is applied to numerous subsets of the input activation values… and the dot product between the 27 activations in the current subset and the 27 weight values in the filter is computed,” Fig. 2 – 210, Col 7 Lines 8-10 “These output activation values 210 are then the input activation values for the next layer of the neural network”).
	Regarding the limitation, and performing, between a selected training iteration and another training iteration after the selected training iteration, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to the NPL of the neural network, No further teaches loss processing that (i) minimizes a loss of the neural network (¶46, Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88, as described above with respect to claim 1) and a range of values to the NPL of the neural network (Fig. 3, ¶59 as described above with respect to claim 1). However, No fails to teach and performing, between a selected training iteration and another training iteration after the selected training iteration, loss processing that… (ii) reduces a range of values to the NPL of the neural network.
Obla teaches and performing, between a selected training iteration and another training iteration after the selected training iteration (Page 42 ¶3 “the first phase of training is performed for 450 epochs… After replacing the activation function with an approximation, the training continues… for 300 epochs,” wherein an “epoch” encompasses training iteration), loss processing that… (ii) reduces a range of values to an NPL of the neural network (Page 24 Figure 3.2, Page 24 Table 3.2, Page 24 ¶1, Page 25 ¶1, Page 25 ¶4, Page 26 Table 3.3, Page 33 ¶1, Page 34 Table 4.1 all as explained above with respect to claim 1).
	No further teaches estimating a range of input values to the NPL of the trained neural network (Fig. 1 – 200, Fig. 3, ¶59).
	No further teaches forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL (Fig. 1 – 200, ¶66 as explained above with respect to claim 1).
	No further teaches and generating a revised neural network by replacing the NPL of the trained neural network with the replacement layer (Fig. 2 – 220, Fig. 3 – 310, ¶76).
	No, Obla, and Teig are analogous to the claimed invention as all are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the iterative training, weighted output, and the first weight value of Teig and the reducing a range of values and selected training iteration of Obla with the methodology of No. The motivation to do so is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52, ¶1) while producing a training method that “reduces overfitting by preventing the network from learning the noise in the training set” (Teig, Col. 16 Lines 1-3).

Regarding claim 22, No in view of Obla and further in view of Teig teaches the computer-implemented method of claim 21 (and thus the rejection of claim 21 is incorporated).
Regarding the limitation wherein the loss processing comprises adjusting the first weight value, No teaches wherein the loss processing comprises adjusting weight values (¶46, as explained above with respect to claim 1). However, No fails to teach wherein the loss processing comprises adjusting the first weight value.
Teig teaches this limitation (Col. 7 Lines 13-25 “the network is put through a supervised training process that adjusts the network's configurable parameters (e.g., the weight coefficients of its linear components)… the training process typically… backpropagates a gradient (rate of change) of a loss function (output error) that quantifies in a particular way the difference between the input set's known output value set and the input set's computed output value set, in order to adjust the network's configurable parameters (e.g., the weight values),” Col. 7 Lines 53-56, Col. 5 Lines 36-39, Fig. 2 – 200-210, Col. 6 Lines 39-42 “Each value in one of the filters is a weight value that is trained,” all as explained above with respect to claim 21).
No and Teig are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the first weight value of Teig with the loss processing of No. The motivation to do so is to produce a training method that “reduces overfitting by preventing the network from learning the noise in the training set” (Teig, Col. 16 Lines 1-3).

Claims 23-24 recite a computer-implemented method that parallels the method claims of 2-3, respectively. Therefore, the analysis discussed above with respect to claims 2-3 also applies to claims 23-24, respectively. Accordingly, claims 23-24 are rejected based on substantially the same rationale as set forth above with respect to claims 2-3, respectively.

Regarding claim 25, No teaches a computer program product (¶124) comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by a processor to cause the processor to perform operations (Fig. 1 – 200, 300, ¶125) comprising: performing, on a neural network comprising at least a first layer and a second layer, wherein the second layer is a non-polynomial layer (NPL) (Fig. 2 – 220, Fig. 3, ¶74), a training process on a neural network resulting in a trained neural network (Fig. 1 – 10, ¶43 as described above with respect to claim 1).
Regarding the limitation the training process comprising: performing a plurality of training iterations on the neural network, wherein in a certain training iteration a weighted output value is generated by using a first weight value to generate an output of the first layer and the weighted output value is sent as an input value in the NPL, No teaches the training process comprising: performing a plurality of training iterations on the neural network (Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88, as described above with respect to claim 1) and the first layer (¶74 “(e.g., a convolution layer…)”) and the NPL (¶74 “respective ReLU layers”). However, No fails to teach wherein in a certain training iteration a weighted output value is generated by using a first weight value to generate an output of the first layer and the weighted output value is sent as an input in the NPL. 
Teig teaches wherein in a certain training iteration a weighted output value is generated by using a first weight value to generate an output of a first layer and the weighted output value is sent as an input value in a second layer (Col. 7 Lines 53-56, Col. 5 Lines 36-39, Fig. 2 – 200-210, Col. 6 Lines 19-21, Col. 6 Lines 38-42, Col. 6 Lines 54-60, Col. 7 Lines 8-10, all as explained above with respect to claim 21).
	Regarding the limitation, and performing, between a selected training iteration and another training iteration after the selected training iteration, loss processing that (i) minimizes a loss of the neural network and (ii) reduces a range of values to the NPL of the neural network, No further teaches loss processing that (i) minimizes a loss of the neural network (¶46, Fig. 1 – 200, Fig. 4 – 430-470, 490, ¶88, as described above with respect to claim 1) and a range of values to the NPL of the neural network (Fig. 3, ¶59 as described above with respect to claim 1). However, No fails to teach and performing, between a selected training iteration and another training iteration after the selected training iteration, loss processing that… (ii) reduces a range of values to the NPL of the neural network.
Obla teaches and performing, between a selected training iteration and another training iteration after the selected training iteration (Page 42 ¶3, as explained above with respect to claim 21), loss processing that… (ii) reduces a range of values to an NPL of the neural network (Page 24 Figure 3.2, Page 24 Table 3.2, Page 24 ¶1, Page 25 ¶1, Page 25 ¶4, Page 26 Table 3.3, Page 33 ¶1, Page 34 Table 4.1 all as explained above with respect to claim 1).
	No further teaches estimating a range of input values to the NPL of the trained neural network (Fig. 1 – 200, Fig. 3, ¶59).
	No further teaches forming a replacement layer for the NPL, wherein the replacement layer comprises a polynomial approximation of an operation performed by the NPL (Fig. 1 – 200, ¶66 as explained above with respect to claim 1).
	No further teaches and generating a revised neural network by replacing the NPL of the trained neural network with the replacement layer (Fig. 2 – 220, Fig. 3 – 310, ¶76).
No, Obla, and Teig are analogous to the claimed invention as all are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the iterative training, weighted output, and the first weight value of Teig and the reducing a range of values and selected training iteration of Obla with the computer program product of No. The motivation to do so is to introduce a method of approximating activation functions using polynomial approximations that “outperforms other methods and is robust regardless of the approximation method, degree, dataset, or activation function approximated” (Obla, Page 52 ¶1) while producing a training method that “reduces overfitting by preventing the network from learning the noise in the training set” (Teig, Col. 16 Lines 1-3).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over No in view of Obla and further in view of Beaty et al. (US 20140136707 A1, hereinafter Beaty).

Regarding claim 11, No in view of Obla teaches the computer-implemented method of claim 10 (and thus the rejection of claim 10 is incorporated).
Regarding the limitation wherein the stored program instructions are stored in a computer readable storage device in a data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system, No teaches wherein the stored program instructions are stored in a computer readable storage device in a data processing system (¶125 “Examples of a non-transitory computer-readable storage medium include… magneto-optical data storage devices, optical data storage devices… and any other device that is configured to store the instructions or software,” Fig. 1 – 100-300, ¶57 “The processor 200 may include one or more data processing devices… the execution of the instructions by the one or more data processing devices may configure the processor 200 to perform any one or any combinations of the operations/methods described herein,” wherein a configuration of devices including “data processing devices” encompasses a data processing system, ¶52 “The receiver 100 may include a receiving interface, through which various data are received by the receiver 100. The receiver 100 may receive data from an external device”). However, No fails to teach and wherein the stored program instructions are transferred over a network from a remote data processing system.
Beaty, in the same field of endeavor, teaches this limitation (¶23 “The program code may be run… entirely on the remote computer… the remote computer may be connected to the user's computer through any type of network,” Fig. 10 – 1000, 1004, 1018, 1020, ¶112 “Program code 1018 is located in a functional form on computer readable media 1020 that… may be loaded onto or transferred to data processing system 1000 for processing by processor unit 1004,” ¶116 “The data processing system providing program code 1018 may be… a remote data processing system”).
No and Beaty are analogous to the claimed invention as both are from the same field of endeavor of receiving data for a data processing system. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the transferring of instructions over a network from a remote data processing system of Beaty with the data processing system of No. The motivation to do so is to design “a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources… that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service” (Beaty, ¶29).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over No in view of Obla and further in view of Beaty, and further in view of Cohen et al. (US 20140040446 A1, hereinafter Cohen).

Regarding claim 12, No in view of Obla teaches the computer-implemented method of claim 10 (and thus the rejection of claim 10 is incorporated).
	Regarding the limitation wherein the stored program instructions are stored in a computer readable storage device in a server data processing system, and wherein the stored program instructions are downloaded in response to a request over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system, further comprising: program instructions to meter use of the program instructions associated with the request, No teaches wherein the stored program instructions are stored in a computer readable storage device in a server data processing system (¶125 “Examples of a non-transitory computer-readable storage medium include… magneto-optical data storage devices, optical data storage devices… and any other device that is configured to store the instructions or software”, Fig. 1 – 1, 10, 200, 300, ¶48 “The computing apparatus 10 may be… a data server… or the electronic device 1 may be… the data server”). However, No fails to teach and wherein the stored program instructions are downloaded in response to a request over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system, further comprising: program instructions to meter use of the program instructions associated with the request.
	Beaty teaches and wherein the stored program instructions are downloaded (Fig. 10 – 1000, 1008, 1018, 1026, ¶116 “program code 1018 may be downloaded over a network to persistent storage 1008 from another device or data processing system through computer readable signal media 1026 for use within data processing system 1000. The data processing system providing program code 1018 may be… a remote data processing system”)… for use in a computer readable storage device associated with the remote data processing system (Fig. 4 – 400, 404, 410, ¶23 “The program code may be run… entirely on the remote computer… the remote computer may be connected to the user's computer through any type of network,” ¶75 “set of resources 410 can be software… set of resources 410 may be services on computer system 404 or on another computer system in data processing environment 400,” Fig. 10 – 1018-1024, ¶112 “Program code 1018 and computer readable media 1020 form computer program product 1022 in these examples. In one example, computer readable media 1020 may be computer readable storage media 1024,” ¶113 “computer readable storage media 1024 is a physical or tangible storage device used to store program code 1018”), further comprising: program instructions to meter use of the program instructions associated with a request (Fig. 4 – 404-413, 432-436, 446-448, ¶82 “user 412 may use client 413 to make request 446 to verify information about the use of set of resources 410 by service 408… Responsive to receiving request 446, computer system 404 generates timeline graph 448 using signed information 432 such as timestamps 436 in metering data 434,” Fig. 4 –414, 418, 422, Fig. 7, ¶95 “FIG. 7 may be implemented in software… the steps may be implemented by metering components, such as metering 414 in provenance service 416, metering 418 in hypervisor management 420, and metering 422 in service 408”). However, Beaty fails to teach wherein the program instructions are downloaded in response to a request over a network to a remote data processing system.
	Cohen, in the same field of endeavor, teaches this limitation (Fig. 1 – 110, 120, 124, 140, 150, Fig. 2 – S210-230, ¶24 “in response to receiving a request from computing machine 110, for accessing or manipulating target data, service 150 dynamically invokes one or more instances of remote software 124 on remote host 120 (S210). If it is determined that the location of the target data is remote to the remote host 120 (e.g., if the target data is stored on local computing machine 110 or on remote storage 140), then service 150 seamlessly transfers the data to remote host 120 where the remote software 124 is hosted”).
	Regarding the limitation and program instructions to generate an invoice based on the metered use, Beaty further teaches this limitation (Fig. 4 – 408, 410, 412, 430, 444, ¶81 “Resource usage report 444 may also include one more summaries… The summaries may describe the use of set of resources 410 by service 408 over… time periods of a particular event of interest. Events of interest… may include times of high use of set of resources 410 by service 408, times of low use… and times specified by metering policy 430… metering policy 430 may include instructions to generate a summary in resource usage report 444 describing the use of set of resources 410 by service 408… resource usage report 444 may also be used in the form of an invoice or bill that is sent to user 412”).
No, Beaty, and Cohen are analogous to the claimed invention as all are from the same field of endeavor of receiving data for a data processing system. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the downloaded program instructions and remote data processing system of Beaty and the response to the request of Cohen with the server data processing system of No. The motivation to do so, as stated by both Beaty and Cohen, is to design “a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources… that [may/can] be rapidly provisioned and released with minimal management effort or interaction with a provider of the service” (Beaty, ¶29 and Cohen, ¶60).

Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over No in view of Obla and further in view of Teig, and further in view of Teig et al. (US 20220405591 A1, hereinafter Teig (2022)).

Regarding claim 19, No in view of Obla teaches the computer system of claim 18 (and thus the rejection of claim 18 is incorporated).
Regarding the limitation wherein the neural network comprises a first layer and a second layer, wherein the second layer is the NPL, and wherein, for an iteration in the plurality of training iterations: the neural network generates a weighted output value by applying a first weight value to an output of the first layer, No teaches wherein the neural network comprises a first layer and a second layer, wherein the second layer is the NPL (Fig. 2 – 220, Fig. 3, ¶74). However, No fails to teach and wherein, for an iteration in the plurality of training iterations: the neural network generates a weighted output value by applying a first weight value to an output of the first layer.
Teig teaches and wherein, for an iteration in the plurality of training iterations: the neural network generates a weighted output value by applying a first weight value to an input of the first layer (Col. 7 Lines 53-56, Col. 5 Lines 36-39, Fig. 2 – 200-210, Col. 6 Lines 19-21, Col. 6 Lines 38-42, Col. 6 Lines 54-60, all as explained above with respect to claim 21). However, Teig fails to teach applying a first weight value to an output of the first layer.
Teig (2022), in the same field of endeavor, teaches applying weight values to an output of a layer (Fig. 2 – 210, ¶30 “the linear component 210 of each input neuron of some embodiments computes a dot product of a vector of weight coefficients and a vector of input values,” Fig. 5, ¶57 “FIG. 5 illustrates a simple feed-forward neural network 500 with one hidden layer having two nodes, and a single output layer with one output node… The output layer node C receives its inputs from the outputs of nodes A and B, and uses weight values wCA and wCB respectively for its linear component,” wherein the “linear component” of “output layer node C” which “computes a dot product of a vector of weight coefficients” and “a vector of input values,” or the “outputs of nodes A and B,” encompasses applying weight values to an output of a layer).
Regarding the limitation and the NPL receives the weighted output value as an input value in the range of input values, No teaches input data in the range of input values (Fig. 1 – 200, Fig. 2 – 220, 230, 250, ¶66 “the typical ReLU of the neural network may… act on input data from another layer of the neural network,” ¶81 “The processor 200 may calculate maximum and minimum values 250 of the input values of the ReLU function based on the pre-trained deep learning model 220 and the sample 230 of the trained data set”). However, No fails to teach and the NPL receives the weighted output value as an input value in the range of input values.
Teig teaches and the NPL receives the weighted output value as an input value (Fig. 2 – 210, Col 7 Lines 8-10 “These output activation values 210 are then the input activation values for the next layer of the neural network”).
No, Teig, and Teig (2022) are analogous to the claimed invention as all are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the plurality of training iterations, weighted output value, and first weight value of Teig and the applying of a weight value to an output of Teig (2022) with the first and second layers of the neural network of No. The motivation to do so is to produce a training method that “reduces overfitting by preventing the network from learning the noise in the training set” (Teig, Col. 16 Lines 1-3) while designing “Techniques that allow for lower power consumption without a reduction in the effectiveness of a neural network” (Teig (2022), ¶3).

Regarding claim 20, No in view of Obla and further in view of Teig and further in view of Teig (2022) teaches the computer system of claim 19 (and thus the rejection of claim 19 is incorporated).
Regarding the limitation wherein the loss processing comprises adjusting the first weight value, No teaches wherein the loss processing comprises adjusting weight values (¶46, as explained above with respect to claim 1). However, No fails to teach wherein the loss processing comprises adjusting the first weight value.
Teig teaches this limitation (Col. 7 Lines 13-25, Col. 7 Lines 53-56, Col. 5 Lines 36-39, Fig. 2 – 200-210, Col. 6 Lines 39-42, all as explained above with respect to claim 22).
No and Teig are analogous to the claimed invention as both are from the same field of endeavor of machine learning. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the first weight value of Teig with the loss processing of No. The motivation to do so is to produce a training method that “reduces overfitting by preventing the network from learning the noise in the training set” (Teig, Col. 16 Lines 1-3).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM MICHAEL LEE whose telephone number is (571)272-4761. The examiner can normally be reached Monday-Thursday: 8am-5pm, every other Friday 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571)272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/W.M.L./
Examiner, Art Unit 2145


/CESAR B PAULA/               Supervisory Patent Examiner, Art Unit 2145
Read full office action
POLYNOMIAL APPROXIMATION OF DEEP LEARNING MODELS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

POLYNOMIAL APPROXIMATION OF DEEP LEARNING MODELS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email