Last updated: May 29, 2026
Application No. 17/454,213
ENERGY- AND MEMORY-EFFICIENT TRAINING OF NEURAL NETWORKS

Non-Final OA §101§102§103
Filed
Nov 09, 2021
Priority
Nov 26, 2020 — DE 10 2020 214 850.3
Examiner
BREEN, JAKE TIMOTHY
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Robert Bosch GmbH
OA Round
2 (Non-Final)
This examiner grants 62% of cases after interview

— +71.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

BREEN, JAKE TIMOTHY View full profile →
Grants 62% of resolved cases
Career Allowance Rate
8 granted / 13 resolved
+6.5% vs TC avg
Strong +71% interview lift
Without
With
+71.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
8 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
7.9%
-32.1% vs TC avg
§103
85.7%
+45.7% vs TC avg
§102
4.8%
-35.2% vs TC avg
§112
1.6%
-38.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
This action is in response to the filing on 08/07/2025. Claims 1-15, are pending and have been considered below.

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	Claims 1-15 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.

Independent Claims 1, 13, 14, and 15
Step 1: 
Claims 1 and 13 recite methods, 14 recites a manufacture, and 15 recites a system; therefore, they are directed to one of the four categories of statutory subject matter (process/method, machine/product/apparatus, manufacture, or composition of matter).

Step 2A Prong 1: 
	Claims 1 and 14-15 recite a method, manufacture, and system comprising:
initializing the parameters — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)).
and mapping, by the ANN, the training data onto outputs — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
assessing a matching of the outputs with the target outputs according to a predefined cost function — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
based on a predefined criterion, selecting, from the set of parameters, at least one first subset of parameters to be trained and one second subset of parameters to be retained — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
optimizing the parameters to be trained with an objective that a further processing of the training data by the ANN prospectively results in a better assessment by the cost function — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Claim 13 recites a method comprising:
initializing the parameters — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)).
and mapping, by the ANN, the training data onto outputs — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
assessing a matching of the outputs with the target outputs according to a predefined cost function — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
based on a predefined criterion, selecting, from the set of parameters, at least one first subset of parameters to be trained and one second subset of parameters to be retained — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
optimizing the parameters to be trained with an objective that a further processing of the training data by the ANN prospectively results in a better assessment by the cost function — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
mapping, by the ANN, the measured data onto second outputs — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.
generating an activation signal from the second outputs — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Step 2A Prong 2: 
	This judicial exception is not integrated into a practical application.

	Claim 1 recites the additional elements of: 
a method for training an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the method comprising the following steps — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to an ANN.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

Claim 13 recites the additional elements of: 
a method, comprising the following steps: training an artificial neural network ANN whose behavior is characterized by a set of trainable parameters, the training including — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to an ANN.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
supplying the ANN with measured data that have been recorded via at least one sensor — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
activating, via the activation signal, a vehicle and/or an object recognition system and/or a system for quality control of products and/or a system for medical imaging — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to a vehicle, an object recognition system, a system for quality control of products, and/or a system for medical imaging.

Claim 14 recites the additional elements of: 
a non-transitory machine-readable data medium on which is stored a computer program for training an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the computer program, when executed by one or more computers, causing the one or more computers to perform the following steps — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to a generic computer component.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

Claim 15 recites the additional elements of: 
a computer configured to train an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the computer configured to — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to a generic computer.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

Step 2B: 
	The claims do not contain significantly more than the judicial exception.

		Claim 1 recites the additional elements of: 
a method for training an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the method comprising the following steps — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to an ANN.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

Claim 13 recites the additional elements of: 
a method, comprising the following steps: training an artificial neural network ANN whose behavior is characterized by a set of trainable parameters, the training including — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to an ANN.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
supplying the ANN with measured data that have been recorded via at least one sensor — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
activating, via the activation signal, a vehicle and/or an object recognition system and/or a system for quality control of products and/or a system for medical imaging — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to a vehicle, an object recognition system, a system for quality control of products, and/or a system for medical imaging.

Claim 14 recites the additional elements of: 
a non-transitory machine-readable data medium on which is stored a computer program for training an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the computer program, when executed by one or more computers, causing the one or more computers to perform the following steps — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to a generic computer component.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

Claim 15 recites the additional elements of: 
a computer configured to train an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the computer configured to — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to a generic computer.
providing training data which is labeled with target outputs onto which the ANN is to map the training data — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(II), receiving or transmitting data over a network).
supplying the training data to the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

As such claims 1 and 13-15 are not patent eligible.

Dependent Claims 2-12
Step 1: 
Claims 2-12 recite a method; therefore, they are directed to one of the four categories of statutory subject matter (process/method, machine/product/apparatus, manufacture, or composition of matter).

Step 2A Prong 1:
Claims 2-12 merely narrow the previously cited abstract idea limitations. For the reasons described above with respect to independent claim 1 this judicial exception is not meaningfully integrated into a practical application, or significantly more than the abstract idea. The claims disclose similar limitations described for the independent claim above and do not provide anything more than the abstract idea.

	Claim 2 recites a method comprising:
wherein the predefined criterion involves a relevance assessment of the parameters — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)).

Claim 3 recites a method comprising:
wherein the relevance assessment of at least one of the parameters includes a partial derivative of the cost function after an activation of the at least one of the parameters at at least one location that is predefined by training data — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)).

Claim 4 recites a method comprising:
wherein the predefined criterion includes selecting a predefined number of most relevant parameters, and/or parameters whose relevance assessment is better than a predefined threshold value, as the parameters to be trained — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Claim 6 recites a method comprising:
wherein the predefined criterion involves selecting a number of parameters, ascertained based on a predefined budget for time and/or hardware resources, as the parameters to be trained — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Claim 7 recites a method comprising:
wherein the parameters to be retained are selected from weights via which inputs, which are supplied to neurons or other processing units of the ANN, are summed for activations of the neurons or other processing units, and bias values, which are additively offset against the activations, are selected as the parameters to be trained — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Claim 8 recites a method comprising:
wherein in response to a training progress of the ANN, measured based on the cost function, meeting a predefined criterion, at least one parameter from the subset of parameters to be retained is transferred into the subset of parameters to be trained — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Claim 9 recites a method comprising:
wherein the parameters are initialized using values from a numerical sequence that has been generated by a deterministic algorithm, proceeding from a starting configuration — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Claim 10 recites a method comprising:
wherein the numerical sequence is a pseudorandom numerical sequence — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Claim 11 recites a method comprising:
wherein a compression of the ANN is generated which includes at least — Under its broadest reasonable interpretation, this limitation encompasses the abstract idea of a mental process, or a concept that can be performed in the human mind with the use of a physical aid (e.g. pen and paper), including observation, evaluation, judgement or opinion (see MPEP § 2106.04(a)(2)(III)). Or a mathematical concept (see MPEP § 2106.04(a)(2)(I)), specifically organizing information and manipulating information through mathematical correlations.

Step 2A Prong 2: 
	This judicial exception is not integrated into a practical application.

	Claim 11 recites the additional elements of: 
information that characterizes an architecture of the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
information that characterizes the deterministic algorithm — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
the starting configuration for the deterministic algorithm — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
completely trained values of the parameters to be trained — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

	Claim 12 recites the additional element of: 
wherein the ANN is configured as an image classifier that maps images onto an association with one or multiple classes of a predefined classification — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to an image classifier.

Step 2B: 
	The claims do not contain significantly more than the judicial exception.

	Claim 11 recites the additional elements of: 
information that characterizes an architecture of the ANN — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
information that characterizes the deterministic algorithm — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
the starting configuration for the deterministic algorithm — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).
completely trained values of the parameters to be trained — This element amounts to no more than insignificant extra-solution activity in the form of mere data gathering and output (see MPEP § 2106.05(g)), and is well-understood, routine, conventional activity (see MPEP § 2106.05(d)(iv), storing and retrieving information in memory).

	Claim 12 recites the additional element of: 
wherein the ANN is configured as an image classifier that maps images onto an association with one or multiple classes of a predefined classification — This element amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (see MPEP § 2106.05(h)). This element merely limits the use of the abstract idea to an image classifier.

As such claims 2-12 are not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 8, and 14-15 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025) hereinafter Kharaghani.

	Regarding claim 1, Kharaghani teaches a method for training an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the method comprising the following steps (a method for training a neural network, the neural network comprising at least one layer comprising a plurality of input nodes, a plurality of output nodes, and a plurality of connections for connecting each one of the plurality of input nodes to each one of the plurality of output nodes. [para. 6]):
	initializing the parameters (Referring now to FIG. 3, the step 202 of initializing the neural network comprises initializing the connection parameters (e.g. the interconnection weights) of each connection. [para. 45]);
	providing training data which is labeled with target outputs onto which the ANN is to map the training data (Each layer maps the input nodes to the output nodes, in a way that is specific to the type of layer. [para. 3]; During the feed-forward phase, input data (also referred to as training examples) representing sets of pre-classified data is fed through the neural network layers. The outputs of the neural network are computed by a series of data manipulations as the input data values propagate through the various neural network nodes and weighted connections. In particular, in the embodiment illustrated in FIG. 2A, step 204 comprises feeding the input data through the neural network layers over a randomly (or pseudo-randomly) selected subset of connections, as will be discussed further below. Step 204 may also comprise proceeding with the backward propagation (or backpropagation) phase. In the backpropagation phase, errors between the output values generated during the feed-forward phase and desired output values are computed and propagated back through the neural network layers. [para. 41]);
	supplying the training data to the ANN and mapping, by the ANN, the training data onto outputs (Each layer maps the input nodes to the output nodes, in a way that is specific to the type of layer. [para. 3]; During the feed-forward phase, input data (also referred to as training examples) representing sets of pre-classified data is fed through the neural network layers. The outputs of the neural network are computed by a series of data manipulations as the input data values propagate through the various neural network nodes and weighted connections. In particular, in the embodiment illustrated in FIG. 2A, step 204 comprises feeding the input data through the neural network layers over a randomly (or pseudo-randomly) selected subset of connections, as will be discussed further below. Step 204 may also comprise proceeding with the backward propagation (or backpropagation) phase. In the backpropagation phase, errors between the output values generated during the feed-forward phase and desired output values are computed and propagated back through the neural network layers. [para. 41]);
	assessing a matching of the outputs with the target outputs according to a predefined cost function (After the input data has been fed forward through the network layers over the randomly- (or pseudo-randomly-) selected connections (i.e. with the randomly- or pseudo-randomly-selected interconnection weights), the method proceeds with the backpropagation step 408, in which errors between outputs (resulting and desired) are propagated back through the neural network layers [para. 55]);
	based on a predefined criterion, selecting, from the set of parameters, at least one first subset of parameters to be trained and one second subset of parameters to be retained (Each connection (and the corresponding interconnection weight) can be selected with a probability p and temporarily dropped with a probability 1−p. During forward and backward propagation, a selected connection and the interconnection weight associated therewith remain active (i.e. retained in the network) whereas a non-selected connection is inactive (i.e. temporarily dropped or removed from the network) and the interconnection weight associated therewith temporarily omitted. In one embodiment, at step 304, the probability of selecting each connection is initialized randomly from a uniform distribution. In another embodiment, the probability is initialized pseudo-randomly. Each retention probability is selected independently, such that each connection has a different probability of being selected. [para. 45]);
	optimizing the parameters to be trained with an objective that a further processing of the training data by the ANN prospectively results in a better assessment by the cost function (the interconnection weights that were active (i.e. the interconnection weights associated with the randomly- or pseudo-randomly-drawn subset of connections) during the forward pass of the training examples are updated based on the error, learning rate, and the gradients of the interconnection weights [para. 56]);
	leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization (Each connection (and the corresponding interconnection weight) can be selected with a probability p and temporarily dropped with a probability 1−p. During forward and backward propagation, a selected connection and the interconnection weight associated therewith remain active (i.e. retained in the network) whereas a non-selected connection is inactive (i.e. temporarily dropped or removed from the network) and the interconnection weight associated therewith temporarily omitted. In one embodiment, at step 304, the probability of selecting each connection is initialized randomly from a uniform distribution. In another embodiment, the probability is initialized pseudo-randomly. Each retention probability is selected independently, such that each connection has a different probability of being selected. [para. 45]).
	
	Regarding claim 2, Kharaghani teaches all the limitations of claim 1 and further teaches:
	wherein the predefined criterion involves a relevance assessment of the parameters (A reinforcement signal is then generated based on the contribution of the randomly- (or pseudo-randomly-) selected subset of connections to the cost function. Generating the reinforcement signal comprises attributing a positive reward to the subset of connections if the cost function has been reduced (step 504) and attributing a negative reward (or punishment) to the subset of connections otherwise (step 506). In this manner, it can be ensured that the connections that do not reduce the cost function will be less likely to be chosen during the next iterations of the training process. In one embodiment, attributing the positive reward comprises increasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In other words, a predetermined positive value is added to the current probability value. In one embodiment, attributing the negative reward comprises decreasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In this case, a predetermined value, which may be negative, null, or positive, is added to the current probability value. It should be understood that when positive values are used for both the positive reward and the negative reward (or punishment), the relative magnitude between the two rewards may be such that the positive reward is larger than the negative reward. [para. 57—para. 58]).

	Regarding claim 8, Kharaghani teaches all the limitations of claim 1 and further teaches:
	wherein in response to a training progress of the ANN, measured based on the cost function, meeting a predefined criterion (After the selected interconnection weights and each connection's probability are updated (respective steps 410 and 414), the next step 416 is to assess whether to feed more mini-batches of training data. If this is not the case, the method 200 flows to the step 208 of FIG. 2A of stopping the training process. Otherwise, the method 200 flows to the step 206 of FIG. 2A, where it is determined whether the exit criterion is met. As discussed above with reference to FIG. 2A, the training process is stopped (step 208) if the exit criterion is met. If the exit criterion is not met, the method 200 flows back to step 204 and a new mini-batch is selected and fed-forward through the network layers over a new subset of randomly- ( or pseudo-randomly-) drawn connections (iteration t+ 1 ). In one embodiment, step 206 comprises assessing whether a maximum training time or a maximum number of iterations has been exceeded. It should be under stood that other embodiments may apply. For example, step 206 may comprise assessing whether the error between the resulting output values and the desired output values is greater than a predetermined threshold. [para. 63]), at least one parameter from the subset of parameters to be retained is transferred into the subset of parameters to be trained (The selected connections are also rewarded based on their contribution to decreasing or increasing the loss (step 412). For this purpose and as illustrated in FIG. 5, it is determined (step 502) whether a cost function (i.e. the error between the resulting and desired output values) has been reduced by feeding the input data through the neural network layers over the randomly- ( or pseudo-randomly-) selected connections. A reinforcement signal is then generated based on the contribution of the randomly- (or pseudo-randomly-) selected subset of connections to the cost function. Generating the reinforcement signal comprises attributing a positive reward to the subset of connections if the cost function has been reduced (step 504) and attributing a negative reward (or punishment) to the subset of connections otherwise (step 506). In this manner, it can be ensured that the connections that do not reduce the cost function will be less likely to be chosen during the next iterations of the training process. In one embodiment, attributing the positive reward comprises increasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In other words, a predetermined positive value is added to the current probability value. In one embodiment, attributing the negative reward comprises decreasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In this case, a predetermined value, which may be negative, null, or positive, is added to the current probability value. It should be understood that when positive values are used for both the positive reward and the negative reward (or punishment), the relative magnitude between the two rewards may be such that the positive reward is larger than the negative reward. [para. 57—para. 58]; Parameters are only temporarily masked, thus, parameters retained in one iteration may be trained in the next iteration based on the probability of retention, based on the contribution to the cost function).

Regarding claim 14, claim 14 contains substantially similar limitations to those found in claim 1. Therefore, it is rejected for the same reason as claim 1 above. In addition, Kharaghani further teaches: 
	a non-transitory machine-readable data medium on which is stored a computer program for training an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the computer program, when executed by one or more computers, causing the one or more computers to perform the following steps (Memory 604 may comprise any suitable known or other machine-readable storage medium. Memory 604 may comprise non-transitory computer readable storage medium such as, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Memory 604 may include a suitable combination of any type of computer memory that is located either internally or externally to computing device 600 such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), and electrically-erasable programmable read-only memory (EEPROM), Ferroelectric RAM (FRAM) or the like. Memory 604 may comprise any storage means (e.g. devices) suitable for retrievably storing machine-readable instructions 606 executable by processor (s) 602. [para. 68]).

	Regarding claim 15, claim 15 contains substantially similar limitations to those found in claim 1. Therefore, it is rejected for the same reason as claim 1 above. In addition, Kharaghani further teaches:
	a computer configured to train an artificial neural network (ANN) whose behavior is characterized by a set of trainable parameters, the computer configured to (Referring now to FIG. 6, the method described herein with reference to FIG. 2A to FIG. 5 may be implemented on one or more computing devices (also referred to herein as neural network units) 600. [para. 66]).
	
	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, as applied in claim 2 above, and in view of Zou (US 2019/0180115 A1, first cited in previous office action filed 03/07/2025) hereinafter Zou.

	Regarding claim 3, Kharaghani teaches all the limitations of claim 2 and further teaches:
	wherein the relevance assessment of at least one of the parameters includes the cost function after an activation of the at least one of the parameters at at least one location that is predefined by training data. (A reinforcement signal is then generated based on the contribution of the randomly- (or pseudo-randomly-) selected subset of connections to the cost function. Generating the reinforcement signal comprises attributing a positive reward to the subset of connections if the cost function has been reduced (step 504) and attributing a negative reward (or punishment) to the subset of connections otherwise (step 506). In this manner, it can be ensured that the connections that do not reduce the cost function will be less likely to be chosen during the next iterations of the training process. In one embodiment, attributing the positive reward comprises increasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In other words, a predetermined positive value is added to the current probability value. In one embodiment, attributing the negative reward comprises decreasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In this case, a predetermined value, which may be negative, null, or positive, is added to the current probability value. It should be understood that when positive values are used for both the positive reward and the negative reward (or punishment), the relative magnitude between the two rewards may be such that the positive reward is larger than the negative reward. [para. 57—para. 58]).
	However, Kharaghani fails to teach a partial derivative of the cost function after an activation of the at least one of the parameters at at least one location that is predefined by the training data.
	In the same field of endeavor, Zou teaches:
	a partial derivative of the cost function after an activation of the at least one of the parameters at at least one location that is predefined by the training data (Generally, the training process includes computing partial derivatives of the detection and regression cost functions C1, C2 and adjusting weights and biases of the layers of the artificial neural network 118. [para. 111]).
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate a partial derivative of the cost function after an activation of the at least one of the parameters at at least one location that is predefined by the training data as suggested in Zou into Kharaghani because both systems train artificial neural networks (see Kharaghani, para. 6; see Zou; para. 18). Incorporating the teaching of Zou into Kharaghani would assure an efficient and more effective training process for the artificial neural network (see Zou, para. 103).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, as applied in claim 2 above, and in view of LI et al. (US 2019/0050734 A1, first cited in previous office action filed 03/07/2025) hereinafter Li.

	Regarding claim 4, Kharaghani teaches all the limitations of claim 2 and further teaches:
	wherein the predefined criterion includes selecting parameters whose relevance assessment is better than a predefined threshold value, as the parameters to be trained (Element-wise comparison of matrices P and R is then performed. In particular, each element R[i,j] of the matrix R is compared to each element P[i,j] of the probability matrix P to determine if P[i,j]>R[i,j]. The elements of the binary mask matrix M are then generated accordingly by setting M[i,j] to one if P[i,j]>R[i,j] and setting M[i,j] to zero otherwise. As discussed above, when a given mask matrix element M[i,j] is set to one, the corresponding connection is retained (i.e. included in the given iteration of the training process), whereas the connection is temporarily removed otherwise. [para. 49]).
	However, Kharaghani fails to teach wherein the predefined criterion includes selecting a predefined number of most relevant parameters, as the parameters to be trained.
	In the same field of endeavor, Li teaches:
	wherein the predefined criterion includes selecting a predefined number of most relevant parameters, as the parameters to be trained (As shown in FIG. 2, it firstly trains the neural network to obtain a trained neural network with a desired accuracy. Then, it prunes and fine-tunes the trained neural network, so as to obtain a sparse neural network. [para. 13; FIG. 2]; i.e., fine-tuning is to continue to train the neural network [para. 125]; a compression strategy determining step, for determining a compression strategy of a compression cycle, said compression strategy at least comprising: the target compression ratio of each pruning operation within said compression cycle, the total number of pruning operation to be conducted, and a target compression ratio of said compression cycle; and a pruning and fine-tuning step, for pruning and fine-tuning said intermediate dense neural network based on said compression strategy, until said intermediate dense neural network is compressed into a sparse neural network having said target compression ratio of said compression cycle. [para. 42]);
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate wherein the predefined criterion includes selecting a predefined number of most relevant parameters, as the parameters to be trained as suggested in Li into Kharaghani because both systems train neural networks (see Kharaghani, para. 6; see Li, FIG. 2). Incorporating the teaching of Li into Kharaghani would effectively shorten the training period of a neural network, also compress the neural network while maintaining its accuracy (see Li, para. 41).


Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, as applied in claim 2 above, and in view of Masse et al. (US 2020/0250483 A1, first cited in previous office action filed 03/07/2025) hereinafter Masse.

	Regarding claim 5, Kharaghani teaches all the limitations of claim 2 and further teaches:
	wherein for the relevance assessment of at least one parameter, a previous change experienced by the at least one parameter during the optimization is used (A reinforcement signal is then generated based on the contribution of the randomly- (or pseudo-randomly-) selected subset of connections to the cost function. Generating the reinforcement signal comprises attributing a positive reward to the subset of connections if the cost function has been reduced (step 504) and attributing a negative reward (or punishment) to the subset of connections otherwise (step 506). In this manner, it can be ensured that the connections that do not reduce the cost function will be less likely to be chosen during the next iterations of the training process. In one embodiment, attributing the positive reward comprises increasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In other words, a predetermined positive value is added to the current probability value. In one embodiment, attributing the negative reward comprises decreasing the probability of retention associated with each connection in the randomly- (or pseudo-randomly-) selected subset of connections by a predetermined value. In this case, a predetermined value, which may be negative, null, or positive, is added to the current probability value. It should be understood that when positive values are used for both the positive reward and the negative reward (or punishment), the relative magnitude between the two rewards may be such that the positive reward is larger than the negative reward. [para. 57—para. 58]).
	However, Kharaghani fails to teach a previous history of changes experienced by the at least one parameter during the optimization is used.
	In the same field of endeavor, Masse teaches:
	a previous history of changes experienced by the at least one parameter during the optimization is used (One approach to alleviating or mitigating catastrophic forgetting in ANNs is to determine an “importance” of each weight to the predictive capability of the ANN for a given task, and bias adjustment of each weight during subsequent task trainings in proportion to, or as a function of, the determined importance. The adjustment bias for a given weight thus acts to computationally inhibit adjustment of the given weight based, at least in part, on the importance of the weight to one or more other tasks for which the ANN has previously been trained. Note that in this context, the bias applied to adjusting weights during training should not be confused with the bias parameters of the ANN. In view of the resistance to adjustment of weights with high importance to previously-trained tasks, this approach is referred to herein as “weight stabilization,” and may be considered a more general example of “synaptic stabilization” as mentioned above. [para. 62]).
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate a previous history of changes experienced by the at least one parameter during the optimization is used as suggested in Masse into Kharaghani because both systems train neural networks (see Kharaghani, para. 6; see Masse, para. 13). Incorporating the teaching of Masse into Kharaghani would yield much higher accuracy of ANN prediction at runtime for all tasks, a much slower drop-off in predictive accuracy of the ANN after many multiples of sequential trainings, and the drop-off in accuracy appeared to slow down beyond the range that degradation set in (see Masse, para. 12).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, as applied in claim 1 above, and in view of Kida et al. (US 2017/0091657 A1, first cited in previous office action filed 03/07/2025) hereinafter Kida.

Regarding claim 6, Kharaghani teaches all the limitations of claim 1. 
However, Kharaghani fails to teach wherein the predefined criterion involves selecting a number of parameters, ascertained based on a predefined budget for time and/or hardware resources, as the parameters to be trained.
	In the same field of endeavor, Kida teaches:
	wherein the predefined criterion involves selecting a number of parameters, ascertained based on a predefined budget for time and/or hardware resources, as the parameters to be trained (the model determination module 204 may eliminate features that are highly correlated with other features used for classification and/or select a model size in favor of a reduction in hardware/resource cost [para. 25]; the cost metrics utilized by the feature computation module 212 incorporate an implementation cost of the corresponding feature on a particular target platform. In some embodiments, the feature computation module 212 may compute the cost metrics (e.g., in the background) using simulation, cross-compilation, and/or heuristic-based analysis. It should be appreciated that the cost metrics may allow the computing device 100 to automatically prune features that do not meet the target platform resource budget (e.g., maximum time to compute, buffer size, code size, code instructions, etc.) [para. 26]);
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate wherein the predefined criterion involves selecting a number of parameters, ascertained based on a predefined budget for time and/or hardware resources, as the parameters to be trained as suggested in Kida into Kharaghani because both systems implement machine learning (see Kharaghani, para. 1; see Kida, Abstract). Incorporating the teaching of Kida into Kharaghani would ensure the co-optimization of accuracy and computational resource utilization in the target platform enables automation (instead of manual developer-based iterations) of the target architecture as the compiled code is effectively ensured to “fit” on the platform and operate within the particular resource constraints of that platform (see Kida, para. 25).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, as applied in claim 1 above, and in view of Darvish Rouhani et al. (US 2020/0202213 A1, first cited in previous office action filed 03/07/2025) hereinafter Darvish.

	Regarding claim 7, Kharaghani teaches all the limitations of claim 1 and further teaches:
	wherein the parameters to be retained are selected from weights (Each connection (and the corresponding interconnection weight) can be selected with a probability p and temporarily dropped with a probability 1−p. During forward and backward propagation, a selected connection and the interconnection weight associated therewith remain active (i.e. retained in the network) whereas a non-selected connection is inactive (i.e. temporarily dropped or removed from the network) and the interconnection weight associated therewith temporarily omitted. In one embodiment, at step 304, the probability of selecting each connection is initialized randomly from a uniform distribution. In another embodiment, the probability is initialized pseudo-randomly. Each retention probability is selected independently, such that each connection has a different probability of being selected. [para. 45]).
	However, Kharaghani fails to teach weights via which inputs, which are supplied to neurons or other processing units of the ANN, are summed for activations of the neurons or other processing units, and bias values, which are additively offset against the activations, are selected as the parameters to be trained.
	In the same field of endeavor, Darvish teaches:
	weights via which inputs, which are supplied to neurons or other processing units of the ANN, are summed for activations of the neurons or other processing units, and bias values, which are additively offset against the activations (Each of the nodes produces an output by applying a weight to each input generated from the preceding node and collecting the weights to produce an output value. In some examples, each individual node can have an activation function and/or a bias applied. For example, any appropriately programmed processor or FPGA can be configured to implement the nodes in the depicted neural network 200. In some example neural networks, an activation function ƒ( ) of a hidden combinational node n can produce an output expressed mathematically ... where wi is a weight that is applied (multiplied) to an input edge xi, plus a bias value bi. In some examples, the activation function produces a continuous value (represented as a floating-point number) between 0 and 1. In some examples, the activation function produces a binary 1 or 0 value, depending on whether the summation is above or below a threshold. [para. 50]); wherein the parameters selected from bias values, are selected as the parameters to be trained (Neural networks can be trained and retrained by adjusting constituent values of the activation function. For example, by adjusting weights wi or bias values bi for a node, the behavior of the neural network is adjusted by corresponding changes in the networks output tensor values. For example, a cost function C(w, b) can be used to find suitable weights and biases for the network and described mathematically ... where w and b represent all weights and biases, n is the number of training inputs, a is a vector of output values from the network for an input vector of training inputs x. By adjusting the network weights and biases, the cost function C can be driven to a goal value (e.g., to zero (0)) using various search techniques, for examples, stochastic gradient descent. [para. 51]).
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate weights via which inputs, which are supplied to neurons or other processing units of the ANN, are summed for activations of the neurons or other processing units, and bias values, which are additively offset against the activations, are selected as the parameters to be trained as suggested in Darvish into Kharaghani because both systems train neural networks (see Kharaghani, para. 6; see Darvish, para. 2). Incorporating the teaching of Darvish into Kharaghani would reduce the demands for computation as well as a memory bandwidth in a given system (see Darvish, para. 27).

Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, as applied in claim 1 above, and in view of L. BLUM et al. (A SIMPLE UNPREDICTABLE PSEUDO-RANDOM NUMBER GENERATOR*, published May 1986, first cited in previous office action filed 03/07/2025) hereinafter Blum.

	Regarding claim 9, Kharaghani teaches all the limitations of claim 1 and further teaches:
	wherein the parameters are initialized using values (Referring now to FIG. 3, the step 202 of initializing the neural network comprises initializing the connection parameters (e.g. the interconnection weights) of each connection. For this purpose, the interconnection weights for all connections of each fully connected layer of the neural network are randomly (or pseudo-randomly) initialized at step 302. [para. 45]);
	However, Kharaghani fails to teach using values from a numerical sequence that has been generated by a deterministic algorithm, proceeding from a starting configuration.
	In the same field of endeavor, Blum teaches:
	using values from a numerical sequence that has been generated proceeding from a starting configuration (Example. Let N = 7 • 19 = 133 and x0 = 4. Then the sequence X0, X1 = x02 mod 133, … has period 6, where x0 , x1 , … , x5 , … = 4, 16, 123, 100, 25, 93, … . So b0 b1 ... b5 ... = 0 0 1 0 1 1 … . The latter string of b's is the pseudo-random sequence generated by the x2 mod N generator with input (133, 4). Here, λ(N) = 18 and λ(λ(N)) = 6. [pg. 368, lines 15-19]); a numerical sequence that has been generated by a deterministic algorithm (There is an efficient deterministic algorithm A which when given N (of the prescribed form), the prime factors of N and any quadratic residue x0 in Z*--N, efficiently computes the unique quadratic residue (x-1)2 mod N such that mod N = x0. [pg. 373, Theorem 3., lines 1-3]).
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate using values from a numerical sequence that has been generated by a deterministic algorithm, proceeding from a starting configuration as suggested in Blum into Kharaghani because both systems use pseudo-random numerical generators (see Kharaghani, para. 45; see Blum, pg. 365, lines 4-7). Incorporating the teaching of Blum into Kharaghani would reveal additional useful properties of this generator: e.g., from knowledge of the (secret) factorization of N, one can generate the sequence backwards; from additional information about N, one can even random access the sequence. [Their] number-theoretic analyses also provide tools for determining the lengths of periods of the generated sequences (see Blum, pg. 365, lines 26-30).

	Regarding claim 10, the combination of Kharaghani and Blum teaches all the limitations of claim 9 and further teaches:
	wherein the numerical sequence is a pseudorandom numerical sequence (Referring now to FIG. 3, the step 202 of initializing the neural network comprises initializing the connection parameters (e.g. the interconnection weights) of each connection. For this purpose, the interconnection weights for all connections of each fully connected layer of the neural network are randomly (or pseudo-randomly) initialized at step 302. [Kharaghani, para. 45]; Example. Let N = 7 • 19 = 133 and x0 = 4. Then the sequence X0, X1 = x02 mod 133, … has period 6, where x0 , x1 , … , x5 , … = 4, 16, 123, 100, 25, 93, … . So b0 b1 ... b5 ... = 0 0 1 0 1 1 … . The latter string of b's is the pseudo-random sequence generated by the x2 mod N generator with input (133, 4). Here, λ(N) = 18 and λ(λ(N)) = 6. [Blum, pg. 368, lines 15-19]).
	
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, in view of L. BLUM et al. (A SIMPLE UNPREDICTABLE PSEUDO-RANDOM NUMBER GENERATOR*, published May 1986, first cited in previous office action filed 03/07/2025), hereinafter Blum, as applied in claim 9 above, and further in view of LI et al. (US 2019/0050734 A1, first cited in previous office action filed 03/07/2025) hereinafter Li.

Regarding claim 11, the combination of Kharaghani and Blum teaches all the limitations of claim 9 and further teaches:
	information that characterizes the deterministic algorithm (Referring now to FIG. 3, the step 202 of initializing the neural network comprises initializing the connection parameters (e.g. the interconnection weights) of each connection. For this purpose, the interconnection weights for all connections of each fully connected layer of the neural network are randomly (or pseudo-randomly) initialized at step 302. [Kharaghani, para. 45]; There is an efficient deterministic algorithm A which when given N (of the prescribed form), the prime factors of N and any quadratic residue x0 in Z*--N, efficiently computes the unique quadratic residue (x-1)2 mod N such that mod N = x0. [pg. 373, Theorem 3., lines 1-3]));
	the starting configuration for the deterministic algorithm (Referring now to FIG. 3, the step 202 of initializing the neural network comprises initializing the connection parameters (e.g. the interconnection weights) of each connection. For this purpose, the interconnection weights for all connections of each fully connected layer of the neural network are randomly (or pseudo-randomly) initialized at step 302. [Kharaghani, para. 45]; There is an efficient deterministic algorithm A which when given N (of the prescribed form), the prime factors of N and any quadratic residue x0 in Z*--N, efficiently computes the unique quadratic residue (x-1)2 mod N such that mod N = x0. [pg. 373, Theorem 3., lines 1-3])).
	However, the combination of Kharaghani and Blum fails to teach wherein a compression of the ANN is generated which includes at least, information that characterizes an architecture of the ANN; and completely trained values of the parameters to be trained.
	In the same field of endeavor, Li teaches:
	wherein a compression of the ANN is generated which includes at least (In Embodiment 2, it proposes another novel compression method for neural networks, wherein in each compression cycle, it uses a dynamic compression strategy to compress the neural network. [para. 159]);
	information that characterizes an architecture of the ANN (The compression method according to Embodiment 2 allows to compress an initial neural network during the training process, instead of having to wait for a trained neural network to initiate the compression process. [para. 216]; The ANN is compressed during training, thus, the compressed ANN includes all the information that the ANN training system includes);
	completely trained values of the parameters to be trained (In Step 1530, it prunes and fine-tunes the dense neural network obtained in Step 1510 based on the compression strategy determined in Step 1520, until the neural network reaches the target final density Dfinal of the current compression cycle. [para. 199]).
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate wherein a compression of the ANN is generated which includes at least, information that characterizes an architecture of the ANN, and completely trained values of the parameters to be trained as suggested in Li into the combination of Kharaghani and Blum because both systems train neural networks (see Kharaghani, para. 6; see Li, FIG. 2). Incorporating the teaching of Li into the combination of Kharaghani and Blum would effectively shorten the training period of a neural network, also compress the neural network while maintaining its accuracy (see Li, para. 41).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025), hereinafter Kharaghani, in view of Denolf et al. (US 2020/0104715 A1, first cited in previous office action filed 03/07/2025) hereinafter Denolf.

	Regarding claim 12, Kharaghani teaches all the limitations of claim 1 and further teaches:
	wherein the ANN is configured as a classifier that maps onto an association with one or multiple classes of a predefined classification (Each layer maps the input nodes to the output nodes, in a way that is specific to the type of layer. [para. 3]; During the feed-forward phase, input data (also referred to as training examples) representing sets of pre-classified data is fed through the neural network layers. The outputs of the neural network are computed by a series of data manipulations as the input data values propagate through the various neural network nodes and weighted connections. In particular, in the embodiment illustrated in FIG. 2A, step 204 comprises feeding the input data through the neural network layers over a randomly (or pseudo-randomly) selected subset of connections, as will be discussed further below. Step 204 may also comprise proceeding with the backward propagation (or backpropagation) phase. In the backpropagation phase, errors between the output values generated during the feed-forward phase and desired output values are computed and propagated back through the neural network layers. [para. 41]).
	However, Kharaghani fails to teach wherein the ANN is configured as an image classifier that maps images onto an association with one or multiple classes of a predefined classification.
	In the same field of endeavor, Denolf teaches:
	wherein the ANN is configured as an image classifier that maps images onto an association with one or multiple classes of a predefined classification (The training dataset 110 includes data for training the neural network 106 to generate trained network weights 114. For example, if the neural network 106 is configured to classify images, the training dataset 110 can be a set of pre-classified images. [para. 26]).
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate wherein the ANN is configured as an image classifier that maps images onto an association with one or multiple classes of a predefined classification as suggested in Denolf into Kharaghani because both systems train neural networks with pre-classified data (see Kharaghani, para. 41; see Denolf, para. 26). Incorporating the teaching of Denolf into Kharaghani would train where not only the values of the weights are trained, but also the topology and certain implementation-related attributes of the neural network are found (see Denolf, para. 22).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over KHARAGHANI et al. (US 2018/0300629 A1, first cited in previous office action filed 03/07/2025) hereinafter Kharaghani, in view of Masse et al. (US 2020/0250483 A1, first cited in previous office action filed 03/07/2025) hereinafter Masse, and further in view of Yu et al. (US 2020/0005023 A1, first cited in previous office action filed 03/07/2025) hereinafter Yu.

	Regarding claim 13, Kharaghani teaches a method, comprising the following steps: training an artificial neural network ANN whose behavior is characterized by a set of trainable parameters, the training including (a method for training a neural network, the neural network comprising at least one layer comprising a plurality of input nodes, a plurality of output nodes, and a plurality of connections for connecting each one of the plurality of input nodes to each one of the plurality of output nodes. [para. 6]):
	initializing the parameters (Referring now to FIG. 3, the step 202 of initializing the neural network comprises initializing the connection parameters (e.g. the interconnection weights) of each connection. [para. 45]);
	providing training data which is labeled with target outputs onto which the ANN is to map the training data (Each layer maps the input nodes to the output nodes, in a way that is specific to the type of layer. [para. 3]; During the feed-forward phase, input data (also referred to as training examples) representing sets of pre-classified data is fed through the neural network layers. The outputs of the neural network are computed by a series of data manipulations as the input data values propagate through the various neural network nodes and weighted connections. In particular, in the embodiment illustrated in FIG. 2A, step 204 comprises feeding the input data through the neural network layers over a randomly (or pseudo-randomly) selected subset of connections, as will be discussed further below. Step 204 may also comprise proceeding with the backward propagation (or backpropagation) phase. In the backpropagation phase, errors between the output values generated during the feed-forward phase and desired output values are computed and propagated back through the neural network layers. [para. 41]);
	supplying the training data to the ANN and mapping, by the ANN, the training data onto outputs (Each layer maps the input nodes to the output nodes, in a way that is specific to the type of layer. [para. 3]; During the feed-forward phase, input data (also referred to as training examples) representing sets of pre-classified data is fed through the neural network layers. The outputs of the neural network are computed by a series of data manipulations as the input data values propagate through the various neural network nodes and weighted connections. In particular, in the embodiment illustrated in FIG. 2A, step 204 comprises feeding the input data through the neural network layers over a randomly (or pseudo-randomly) selected subset of connections, as will be discussed further below. Step 204 may also comprise proceeding with the backward propagation (or backpropagation) phase. In the backpropagation phase, errors between the output values generated during the feed-forward phase and desired output values are computed and propagated back through the neural network layers. [para. 41]);
	assessing a matching of the outputs with the target outputs according to a predefined cost function (After the input data has been fed forward through the network layers over the randomly- (or pseudo-randomly-) selected connections (i.e. with the randomly- or pseudo-randomly-selected interconnection weights), the method proceeds with the backpropagation step 408, in which errors between outputs (resulting and desired) are propagated back through the neural network layers [para. 55]);
	based on a predefined criterion, selecting, from the set of parameters, at least one first subset of parameters to be trained and one second subset of parameters to be retained (Each connection (and the corresponding interconnection weight) can be selected with a probability p and temporarily dropped with a probability 1−p. During forward and backward propagation, a selected connection and the interconnection weight associated therewith remain active (i.e. retained in the network) whereas a non-selected connection is inactive (i.e. temporarily dropped or removed from the network) and the interconnection weight associated therewith temporarily omitted. In one embodiment, at step 304, the probability of selecting each connection is initialized randomly from a uniform distribution. In another embodiment, the probability is initialized pseudo-randomly. Each retention probability is selected independently, such that each connection has a different probability of being selected. [para. 45]);
	optimizing the parameters to be trained with an objective that a further processing of the training data by the ANN prospectively results in a better assessment by the cost function (the interconnection weights that were active (i.e. the interconnection weights associated with the randomly- or pseudo-randomly-drawn subset of connections) during the forward pass of the training examples are updated based on the error, learning rate, and the gradients of the interconnection weights [para. 56]);
	leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization (Each connection (and the corresponding interconnection weight) can be selected with a probability p and temporarily dropped with a probability 1−p. During forward and backward propagation, a selected connection and the interconnection weight associated therewith remain active (i.e. retained in the network) whereas a non-selected connection is inactive (i.e. temporarily dropped or removed from the network) and the interconnection weight associated therewith temporarily omitted. In one embodiment, at step 304, the probability of selecting each connection is initialized randomly from a uniform distribution. In another embodiment, the probability is initialized pseudo-randomly. Each retention probability is selected independently, such that each connection has a different probability of being selected. [para. 45]).
	However, Kharaghani fails to teach supplying the ANN with measured data that have been recorded via at least one sensor; mapping, by the ANN, the measured data onto second outputs; generating an activation signal from the second outputs; and activating, via the activation signal, a vehicle and/or an object recognition system and/or a system for quality control of products and/or a system for medical imaging.
	In the same field of endeavor, Masse teaches:
	supplying the ANN with measured data that have been recorded (Artificial neural networks (ANNs) may be used for a variety of machine learning (ML) and artificial intelligence (AI) tasks, such as image recognition or machine vision, speech recognition (e.g., speech-to-text), speech synthesis (e.g., text-to-speech), and pattern recognition, to name a few. In a typical scenario, an ANN may be "trained" to recognize features and/or characteristics of a class of input objects as represented in input data in order to later be able to receive previously unknown data, and in it identify (or rule out the identity of) particular objects of the class with some statistical certainty. [para. 3]);
	mapping, by the ANN, the measured data onto second outputs (Artificial neural networks (ANNs) may be used for a variety of machine learning (ML) and artificial intelligence (AI) tasks, such as image recognition or machine vision, speech recognition (e.g., speech-to-text), speech synthesis (e.g., text-to-speech), and pattern recognition, to name a few. In a typical scenario, an ANN may be "trained" to recognize features and/or characteristics of a class of input objects as represented in input data in order to later be able to receive previously unknown data, and in it identify (or rule out the identity of) particular objects of the class with some statistical certainty. [para. 3]).
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate supplying the ANN with measured data that have been recorded; and mapping, by the ANN, the measured data onto second outputs as suggested in Masse into Kharaghani because both systems train neural networks (see Kharaghani, para. 6; see Masse, para. 13). Incorporating the teaching of Masse into Kharaghani would  yield much higher accuracy of ANN prediction at runtime for all tasks, a much slower drop-off in predictive accuracy of the ANN after many multiples of sequential trainings, and the drop-off in accuracy appeared to slow down beyond the range that degradation set in (see Masse, para. 12).
	However, the combination of Kharaghani and Masse fails to teach measured data that have been recorded via at least one sensor; generating an activation signal from the second outputs; and activating, via the activation signal, a vehicle and/or an object recognition system and/or a system for quality control of products and/or a system for medical imaging.
	In the same field of endeavor, Yu teaches:
	measured data that have been recorded via at least one sensor (The disclosed methods can be used in remote sensing using, for example, images acquired through sensors that detect patterns not directly visible to human eyes. For example, sonar or infrared spectral images can be used to, for example, recognize mineral, gas, or oil deposits. [para. 299]);
	mapping, the measured data onto second outputs (Whatever measure or measures are used, because the pseudo-images of the library are for known images, the result of the comparison can, for example, be used to determine whether the first-image corresponding to the pseudo-image-of-interest is one or more of: (i) in one or more classes or categories of the known images, (ii) a particular known image, (iii) not in one or more classes or categories of the known images, and (iv) not a known image. [para. 176—para. 180]);
	generating an activation signal from the second outputs (The results of the comparison can be employed in various ways. One basic use is to provide a user with a visual, oral, or other type of notification that a “match” has or has not been found. The notification will typically be accompanied by a report which may be as simple as the name of the known image or may include other data including an indication of the level of confidence of the identification. The report can be in visual, oral, or other form. In the case of machine vision, the result of the comparison may be a set of instructions for execution by, for example, a robot, e.g., instructions to interact with the identified object in a particular way. Other ways in which the result of the comparison can be used will be evident to persons skilled in the art from the present disclosure. [para. 181]);
	activating, via the activation signal, a vehicle and/or an object recognition system (More generally, it will be apparent to those skilled in the art that the disclosed image recognition techniques can be used in all forms of machine vision. For example, the disclosed methods can be applied to images or image sequences to identify vehicles, obstacles, traffic signs and passage conditions in an autonomous robotic device, vehicle, or vessel, and inform a central decision maker (e.g., a computer) of existing conditions. The disclosed methods can be used for the identification of faulty parts in mechanical, electrical, and electronic manufacturing. For example, using pseudo-images for faulty vs. intact electronic circuits, the disclosed methods can be used to correctly and rapidly identify defective circuits. [para. 300]) and/or a system for quality control of products (More generally, it will be apparent to those skilled in the art that the disclosed image recognition techniques can be used in all forms of machine vision. For example, the disclosed methods can be applied to images or image sequences to identify vehicles, obstacles, traffic signs and passage conditions in an autonomous robotic device, vehicle, or vessel, and inform a central decision maker (e.g., a computer) of existing conditions. The disclosed methods can be used for the identification of faulty parts in mechanical, electrical, and electronic manufacturing. For example, using pseudo-images for faulty vs. intact electronic circuits, the disclosed methods can be used to correctly and rapidly identify defective circuits. [para. 300]; As just one example, in a quality control setting, using pseudo-images for the parts of a finished machine, a manufacturer can determine if all the parts have been included in a particular finished machine by (i) combining the pseudo-images for the parts into a first-image, (ii) obtaining a pseudo-image for that first-image, and (iii) comparing that pseudo-image with a pseudo-image of the actual finished machine to determine if all the parts are present. [para. 302]) and/or a system for medical imaging (In addition to facial recognition, the disclosed technology can be used in other forms of imaging. For example, an image of an animal or other living object (e.g., plant, cell, organ, tissue, or virus) can be treated in the same way as a facial image to produce a pseudo-image which can then be compared with a library (database) of known pseudo-images. The images that are analyzed can be produced by medical imaging devices, such as, MRI, fMRI, X-ray, CT, and similar devices. Images produced by microscopes, e.g., images of blood and tissue samples, can also be used as original-images, as well as images in the form of sequences (e.g., genetic sequences) or in the form of traces (e.g., EKG and EEG traces). The results of the comparison of pseudo-images-of-interest with a pseudo-image library can, for example, be used as part of the diagnosis of diseases and/or in medical procedures. [para. 296]);
	It would have been obvious to one of ordinary skill, in the art at the time before the effective filing date of the invention to incorporate measured data that have been recorded via at least one sensor; generating an activation signal from the second outputs; and activating, via the activation signal, a vehicle and/or an object recognition system and/or a system for quality control of products and/or a system for medical imaging as suggested in Yu into the combination of Kharaghani and Masse because both systems employ machine learning to classify input data (see Kharaghani, para. 40—para. 41; see Yu, para. 3). Incorporating the teaching of Yu into the combination of Kharaghani and Masse would achieve robust image recognition even for imperfect real world images, such as, real world images that have been degraded by noise, poor illumination, uneven lighting, and/or occlusion, e.g., the presence of glasses, scarves, or the like in the case of facial images (see Yu, Abstract).

Response to Amendment
	Applicant’s amendments to the specification, filed 08/07/2025, have been fully considered and are accepted, the objections to the specification are respectfully withdrawn.

Response to Arguments
Applicant’s amendments to the claims, filed 08/07/2025, with respect to the objections to claims 1, 3, 11, and 13-15 have been fully considered and are accepted, the objection to the claims are respectfully withdrawn.

Applicant’s amendments to the claims, filed 08/07/2025, with respect to the 35 U.S.C. 112(b) indefiniteness rejections to claims 1-15 have been fully considered and are accepted, the objection to the claims are respectfully withdrawn.

Applicant's arguments, filed 08/07/2025, traversing the rejection of claims 1-15 under 35 U.S.C. 101, on pg. 9-13, have been fully considered and are not persuasive. Applicant argues that the claims are analogous to Example 39 of the 2019 Guidance because they are directed to training an artificial neural network, thus insofar as Example 39 did not recite any mathematical relationship or any mental process, it follows that the claims here do not recite any of these judicial exceptions.
Examiner respectfully disagrees.
While both Example 39 from the 2019 Guidance and the argued claims are directed to training an artificial neural network, just because they are directed to training an artificial neural network does not mean they do not recite any mathematical concepts or any mental processes. As the analysis of Example 39 points out, Example 39 does not recite any mathematical concepts or mental processes. However, the presently argued claims do recite mental processes or mathematical concepts, such as initializing the parameters; mapping, by the ANN, the training data onto outputs; assessing a matching of the outputs with the target outputs according to a predefined cost function; based on a predefined criterion, selecting, from the set of parameters, at least one first subset of parameters to be trained and one second subset of parameters to be retained; and optimizing the parameters to be trained with an objective that a further processing of the training data by the ANN prospectively results in a better assessment by the cost function as identified above in the 35 U.S.C. 101 section above and in the previous office action filed 03/07/2025. Thus, the argued claims differ from Example 39 of the 2019 Guidance, because while both claims are directed to training an artificial neural network, the present claims recite mental processes and mathematical concepts while the Example 39 did not. Thus, for at least for aforementioned reason, the present claims are not eligible under prong 1 of the 35 U.S.C. 101 analysis, and the must continue to prong 2.

	Applicant argues, on pg. 10-13, that the claims integrate the judicial exception into a practical application under prong two, because the claims represent a technological improvement in the field of ANN training.
	Examiner respectfully disagrees.
	Applicant specifically argues that the limitation “leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization” represents the technological improvement as reflected in the specification para. 23 and 33, and that the analysis failed to address this technological improvement. However, applicant is reminded that the claims are analyzed as a whole, thus the limitation of “based on a predefined criterion, selecting, from the set of parameters, at least one first subset of parameters to be trained and one second subset of parameters to be retained” must also be considered for the argued technological improvement, as the process of retaining parameters first must select the parameters to be retained before it can retain parameters. As identified in the 35 U.S.C 101 section above, and in the previous office action filed 03/07/2025, selecting a subset of parameters based on a predefined criterion was identified to be a mental process or mathematical concept. This is because it is reasonable for a person to select a subset of parameters based on a predefined criterion in the mind with the aid of a pen and paper, for example, selecting every other parameter in the set of parameters. Further, when selecting parameters to be retained, the goal behind the selection is to retain the parameters, thus, the extra step of retaining the parameters does not add any significant steps that would separate it from the selection step. Thus, the purported improvement is reflected in the claims as part of 2 steps, first selecting the parameters to be retained, which is identified as a judicial exception, and secondly, retaining the parameters, which amounts to no more than just leaving the values as they are and doing nothing to them, which was identified as mere data gathering and output and is well-understood, routine, conventional activity, per MPEP § 2106.05(d)(iv), specifically storing and retrieving information in memory, in the 35 U.S.C. 101 section above, and in the previous office action filed 03/07/2025. As the purported improvement to technology is furnished, at least in part, by the identified judicial exception, it cannot qualify to incorporate the judicial exception into a practical application per MPEP 2106.05(a).
Applicant further argues, on pg. 12, that the improvement of drastic reductions in memory requirements, per para. 24 of the specification, is reflected in the claims.
Examiner respectfully disagrees.
As para. 24 of the specification states, the “memory requirement is drastically reduced by initializing the parameters using values from a numerical sequence that has been generated by a deterministic algorithm, starting from a starting configuration”. None of the aforementioned improvement is reflected in the claims beyond the broad limitation of “initializing the parameters”. Thus, the argument that the claims reflect the drastic reduction in memory is unsupported. Further, para. 24, states that “[f]or compressed storage of all retained parameters, it is then necessary to only store information that characterizes the deterministic algorithm, as well as the starting configuration”. Similarly, the purported improvement is not reflected in the claims, there is no claimed deterministic algorithm, starting configuration, or compressed storage. Thus, the arguments that the claims reflect the drastic memory reduction is further unsupported.
For at least the aforementioned reasons, Examiner asserts that the claims do not integrate the judicial exception into a practical application under prong two of the 35 U.S.C. 101 analysis. Thus, the rejection of claims 1-15 under 35 U.S.C. 101 is respectfully maintained.

Applicant’s arguments, filed 08/07/2025, traversing the rejection of the claims under 35 U.S.C. 102, on pg. 13-15, has been fully considered and is not persuasive. Applicant argues that Kharaghani fails to disclose “providing training data which is labeled with target outputs onto which the ANN is to map the training data”, and that Kharaghani fails to disclose “based on a predefined criterion, selecting, from the set of parameters, at least one first subset of parameters to be trained and one second subset of parameters to be retained”.
Examiner respectfully disagrees.
With respect to the labeled training data, Applicant argues that a POSITA would not have deemed para. 3, and 41 as involving any labeling whatsoever. However, the argued limitation does not state anything about labeling data, it sets forth a requirement that the data is labeled, not labeling the data. As cited above in the 35 U.S.C. 102 section above, and in the previous office action, filed 03/07/2025, para. 41 of Kharaghani states that "In the backpropagation phase, errors between the output values generated during the feed-forward phase and desired output values are computed and propagated back through the neural network layers". A POSITA would have deemed the desired output values as disclosed by Kharaghani to teach data which is labeled with target outputs. And earlier in para. 41, Kharaghani discloses that the input data is referred to as “training examples”. Thus, Kharaghani as cited, would be deemed to teach training data which is labeled with target outputs onto which the ANN is to map the training data.
With respect to the selecting step, Applicant argues, on pg. 14, that Kharaghani must disclose two subsets of the connection weights, one for training, and another to be retained, but instead discloses a first subset to be retained, and a second set to be dropped or removed from the network. However, as recited in the claims, the only limitation pertaining to the trained parameters is “optimizing the parameters to be trained with an objective that a further processing of the training data by the ANN prospectively results in a better assessment by the cost function”, which amounts to training the parameters similar to what the named subset suggests. The first subset which is referred to as the retained parameters in Kharaghani, is retained to be further trained by remaining in the cycle of forward and backward propagation, and thus meets the limitations set forth by the claims. Additionally, as recited in the claims, the only limitation pertaining to the retained parameters as claimed is “leaving the parameters to be retained at their initialized values or at a value already obtained during the optimization”, which does not differ from the subset disclosed by Kharaghani. As applicant points out, the second subset of parameters is temporarily dropped by omitting the weight, thus the parameters are left either at their initialized values or at a value already obtained during training and just omitted during the following forward and backward propagation cycle. Thus, Kharaghani discloses retaining parameters in the same manner as recited in the claims and does not fail to disclose the second subset of parameters.
Applicant further argues, on pg. 15, that Kharaghani fails to disclose “optimizing the parameters to be trained with an objective that a further processing of the training data by the ANN prospectively results in the better assessment by the cost function”.
Examiner respectfully disagrees.
As identified above in the 35 U.S.C 102 section above, and in the previous office action filed 03/07/2025, the cited portion of para. 41 of Kharaghani states that the interconnection weights are updated during the forward pass of the training examples based on the error, learning rate, and the gradients of the interconnection weights. A POSITA would deem updating weights based on an error, learning rate, and weight gradient as optimizing the parameters with an objective to result in a better assessment by the cost function. In other words, updating an interconnection weight based on the error caused by the current interconnection weight would aim to optimize the interconnection weight to result in a better assessment by the cost function. This is a fundamental part of forward and backward propagation in training neural networks, and would be deemed as disclosing the argued limitation by a POSITA.
For at least aforementioned reasons with respect to the rejection of the claims under 35 U.S.C. 102, the rejections to the claims are respectfully maintained.

Applicant’s argument, filed 08/07/2025, traversing the rejection of claims under 35 U.S.C. 103, on pg. 15, has been fully considered and is not persuasive. Applicant argues that none of the remaining references applied overcome the argued deficiencies discussed in the 35 U.S.C. 102 section. However, as has been identified above, Kharaghani is not deficient with respect to any aspect argued by the Applicant under 35 U.S.C. 102. Thus, the rejections under 35 U.S.C. 103 are respectfully maintained. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Liu et al (US 20160328643 A1) teaches supervised training of a neural network with labeled data, pruning some weights and retaining other weights.

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKE BREEN whose telephone number is (571)272-0456. The examiner can normally be reached Monday - Friday, 7:00 AM - 3:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.T.B./Examiner, Art Unit 2143                                                                                                                                                                                                        /JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

Nov 09, 2021
Application Filed
Mar 07, 2025
Non-Final Rejection mailed — §101, §102, §103
Aug 07, 2025
Response Filed
Sep 30, 2025
Final Rejection mailed — §101, §102, §103
Jan 30, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/546,132
Patent 12619919
SYSTEMS AND METHODS FOR NODE WEIGHTING AND AGGREGATION FOR FEDERATED LEARNING TECHNIQUES
4y 4m to grant Granted May 05, 2026
17/681,967
Patent 12608612
PRUNING AND ACCELERATING NEURAL NETWORKS WITH HIERARCHICAL FINE-GRAINED STRUCTURED SPARSITY
4y 1m to grant Granted Apr 21, 2026
17/579,400
Patent 12602577
NEURON CORE WITH TIME-EMBEDDED FLOATING POINT ARITHMETIC
4y 2m to grant Granted Apr 14, 2026
17/825,033
Patent 12555650
SYSTEM AND METHOD FOR MOLECULAR PROPERTY PREDICTION USING EDGE-CONDITIONED GRAPH ATTENTION NEURAL NETWORK
3y 8m to grant Granted Feb 17, 2026
17/675,582
Patent 12518136
INFERENCE EXECUTION METHOD FOR CANDIDATE NEURAL NETWORKS AND SWITCHING NEURAL NETWORKS
3y 10m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+71.4%)
3y 11m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allowance rate.