DETAILED ACTION
This action is responsive to the amendment filed on 09/25/2025. Claims 1-2, 4-6, 8-10, and 12 are pending in the case. Claims 1-2, 5-6, and 9-10 are currently amended. Claims 1, 5, and 9 are independent claims.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgement is made of applicant’s claim for domestic priority based on international application no. PCT/JP2019/045281 filed on 11/19/2019.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/25/2025 is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2, 6, and 10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 2, the claim recites “the input ANN model” in lines 3-4. There is insufficient antecedent basis for this limitation in the claim. The parent claim recites “an ANN model”. It is unclear if applicant is attempting to recite a new claim element or attempting to refer to a previously recited claim element. For examination purposes this claim limitation has been interpreted to mean “the ANN model”, referring to the previously recited claim element.
Further regarding claim 2, the claim recites “the New training data” in line 4. There is insufficient antecedent basis for this limitation in the claim. The parent claim recites “training data”. It is unclear if applicant is attempting to recite a new claim element or attempting to refer to a previously recited claim element. For examination purposes this claim limitation has been interpreted to mean “the training data”, referring to the previously recited claim element.
Further, claim 2 recites “the input Policy model” in line 11. There is insufficient antecedent basis for this limitation in the claim. The parent claim recites “a policy model”. It is unclear if applicant is attempting to recite a new claim element or attempting to refer to a previously recited claim element. For examination purposes this claim limitation has been interpreted to mean “the policy model”, referring to the previously recited claim element.
Regarding claim 6, the claim recites “the input ANN model” in line 2. There is insufficient antecedent basis for this limitation in the claim. The parent claim recites “an ANN model”. It is unclear if applicant is attempting to recite a new claim element or attempting to refer to a previously recited claim element. For examination purposes this claim limitation has been interpreted to mean “the ANN model”, referring to the previously recited claim element.
Further, claim 6 recites “the input Policy model” in line 6. There is insufficient antecedent basis for this limitation in the claim. The parent claim recites “a Policy model”. It is unclear if applicant is attempting to refer to a previously recited claim element or if applicant is attempting to recite a new claim element. For examination purposes this claim limitation has been interpreted to mean “the policy model”, referring to the previously recited claim element.
Regarding claim 10, the claim recites “the input ANN model” in line 3. There is insufficient antecedent basis for this limitation in the claim. The parent claim recites “an ANN model”. It is unclear if applicant is attempting to recite a new claim element or attempting to refer to a previously recited claim element. For examination purposes this claim limitation has been interpreted to mean “the ANN model”, referring to the previously recited claim element.
Further, claim 10 recites “the input Policy model” in line 8. There is insufficient antecedent basis for this limitation in the claim. The parent claim recites “a Policy model”. It is unclear if applicant is attempting to refer to a previously recited claim element or if applicant is attempting to recite a new claim element. For examination purposes this claim limitation has been interpreted to mean “the policy model”, referring to the previously recited claim element.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-2, 4-6, 8-10, and 12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1:
Step 1 Statutory Category: Claim 1 is directed to an apparatus which falls under one of the four statutory categories.
Step 2A Prong 1 Judicial Exception: Claim 1 recites, in part, “compute information matrix of each sample in the training data using training information extracted”. This limitation is the abstract idea of a mathematical calculation, as directed to “a claim that recites a mathematical calculation, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling within the "mathematical concepts" grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number”. See MPEP § 2106.04(a)(2)(I)(C).
Step 2A Prong 2 Integration into a practical application: This judicial exception is not integrated into a practical application. In particular the claim recites: “an information processing apparatus”, “at least one memory configured to store program code”, “at least one processor configured to operate as instructed by the program code”, “training code configured to cause the at least one processor to operate as an ANN (artificial neural networks) model trainer”, “computation code configured to cause the at least one processor to…”, and “policy training code configured to cause the at least one processor to operate as a policy model trainer”. These limitations are additional elements that amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f). Further, the claim recites “train an ANN model using training data” and “train a Policy model, by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the training data and the information matrix, wherein values included in the policy vector cause at least one layer of the policy model to skip processing during an inference phase”. These limitations are recited at a high level of generality and amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f).
Step 2B Significantly more: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed with respect to integration of the abstract idea into a practical application, the additional elements: “an information processing apparatus”, “at least one memory configured to store program code”, “at least one processor configured to operate as instructed by the program code”, “training code configured to cause the at least one processor to operate as an ANN (artificial neural networks) model trainer”, “computation code configured to cause the at least one processor to…”, “policy training code configured to cause the at least one processor to operate as a policy model trainer”, “train an ANN model using training data”, and “train a Policy model, by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the training data and the information matrix, wherein values included in the policy vector cause at least one layer of the policy model to skip processing during an inference phase” are additional elements that amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f). Elements that merely amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 2: The rejection of claim 1 is incorporated, and further: claim 2 recites, in part, “compute the information matrix of each sample in the new training data using the training information”. This limitation is the abstract idea of a mathematical calculation, as directed to “a claim that recites a mathematical calculation, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling within the "mathematical concepts" grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number”. See MPEP § 2106.04(a)(2)(I)(C).
Further, the claim recites the additional elements: “incremental training code configured to cause the at least one processor to operate as an incremental ANN model trainer”, “the computation code”, and “incremental policy training code configured to cause the at least one processor to operate as an incremental policy model trainer”. These limitations are additional elements that amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f). Further, the claim recites “train the ANN model incrementally from the input ANN model with the new training data including pairs of input and output of training and validation for an incremental training phase” and “train the Policy model incrementally from the input policy model by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the New training data and the information matrix”. These limitations are recited at a high level of generality and amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f). Elements that merely amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 4, the rejection of claim 1 is incorporated, and further, the claim recites “wherein the policy model is a light-weight policy model based on a traditional machine learning model with a supervised learning”. This is an additional element that generally links the use of the judicial exception to a particular technological environment or field of use. See MPEP § 2106.05(h). Elements that merely generally link the use of the judicial exception to a particular technological environment or field of use cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 5:
Step 1 Statutory Category: Claim 5 is directed to a method which falls under one of the four statutory categories.
Step 2A Prong 1 Judicial Exception: Claim 5 recites, in part, “computing an information matrix of each sample in the training data using training information extracted during the ANN model training”. This limitation is the abstract idea of a mathematical calculation, as directed to “a claim that recites a mathematical calculation, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling within the "mathematical concepts" grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number”. See MPEP § 2106.04(a)(2)(I)(C).
Step 2A Prong 2 Integration into a practical application: This judicial exception is not integrated into a practical application. In particular the claim recites: “training an ANN (artificial neural network) model using training data” and “training a Policy model by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the training data and the information matrix, wherein values included in the policy vector cause at least one layer of the policy model to skip processing during an inference phase”. These limitations are recited at a high level of generality and amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f).
Step 2B Significantly more: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed with respect to integration of the abstract idea into a practical application, the additional elements: “training an ANN (artificial neural network) model using training data”, and “training a Policy model by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the training data and the information matrix, wherein values included in the policy vector cause at least one layer of the policy model to skip processing during an inference phase” are additional elements that amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f). Elements that merely amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 6, the rejection of claim 5 is incorporated and further, claim 6 is substantially similar to claim 2 respectively, and is rejected in the same manner and reasoning applying.
Regarding claim 8, the rejection of claim 5 is incorporated and further, claim 8 is substantially similar to claim 4 respectively, and is rejected in the same manner and reasoning applying.
Regarding claim 9:
Step 1 Statutory Category: Claim 9 is directed to a machine which falls under one of the four statutory categories.
Step 2A Prong 1 Judicial Exception: Claim 9 recites, in part, “computing an information matrix of each sample in the training data using training information extracted during the ANN model training”. This limitation is the abstract idea of a mathematical calculation, as directed to “a claim that recites a mathematical calculation, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling within the "mathematical concepts" grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number”. See MPEP § 2106.04(a)(2)(I)(C).
Step 2A Prong 2 Integration into a practical application: This judicial exception is not integrated into a practical application. In particular the claim recites: “a non-transitory computer readable medium storing a program for causing a computer to execute an information processing method”. This limitation is an additional element that amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f). Further, the claim recites “training an ANN (artificial neural network) model using training data” and “training a Policy model by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the training data and the information matrix, wherein values included in the policy vector cause at least one layer of the policy model to skip processing during an inference phase”. These limitations are recited at a high level of generality and amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f).
Step 2B Significantly more: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed with respect to integration of the abstract idea into a practical application, the additional elements: “a non-transitory computer readable medium storing a program for causing a computer to execute an information processing method”, “training an ANN (artificial neural network) model using training data”, and “training a Policy model by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the training data and the information matrix, wherein values included in the policy vector cause at least one layer of the policy model to skip processing during an inference phase” are additional elements that amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §2106.05(f). Elements that merely amount to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer in its ordinary capacity as a tool to perform an existing process cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 10, the rejection of claim 9 is incorporated, and further, claim 9 is substantially similar to claims 2 and 6 respectively, is rejected in the same manner and reasoning applying.
Regarding claim 12, the rejection of claim 9 is incorporated, and further, claim 12 is substantially similar to claims 4 and 8 respectively, and is rejected in the same manner and reasoning applying.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 4-6, 8-10, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Spasov et al., Dynamic Neural Network Channel Execution for Efficient Training, 05/15/2019, https://arxiv.org/abs/1905.06435, hereinafter referred to as “Spasov” in view of Theis et al., Faster gaze prediction with dense networks and Fisher pruning, 07/09/2019, https://arxiv.org/abs/1801.05787, hereinafter referred to as “Theis” in further view of Luo et al., AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference, 01/17/2019, https://arxiv.org/pdf/1805.08941, hereinafter referred to as “Luo”.
Regarding claim 1, Spasov teaches An information processing apparatus comprising: at least one memory configured to store program code; and at least one processor configured to operate as instructed by the program code (Spasov, Abstract, Lines 3-6, “in this work, we propose a novel method which reduces the memory footprint and number of computing operations required for training and inference. Our framework efficiently integrates pruning as part of the training procedure by exploring and tracking the relative importance of convolutional channels” The method of Spasov reduces memory footprint and the number of computing operations, which provides evidence that the method of Spasov is performed on a computer/processor which is considered to be the “information processing apparatus”; Spasov, Page 4, Section 3, Line 2, “We conduct all experiments in PyTorch [27]”; Spasov, Page 5, Section 3.1, Lines 1-3, “We evaluate the performance of our method on three datasets: CIFAR-10, CIFAR-100 [28] and Street View House Number (SVHN) [29]. Both CIFAR datasets consist of coloured natural images with resolution 32x32, and comprise 50,000 training examples and 10,000 testing examples”; A person of ordinary skill would recognize that a computer must have performed the experiments disclosed, and further because the experiments were conducted in “PyTorch”, the memory must have been configured to store program code, and the processor to execute that code), the program code comprising:
training code configured to cause the at least one processor to operate as an ANN (artificial neural networks) model trainer configured to train an ANN model using training data (Spasov, Page 6, Section 3.3, Line 1, “All network models are trained using SGD with Nesterov momentum of 0.9 without dampening”; Spasov, Page 2, Section 2, Lines 1-6, “We consider a supervised learning problem with a set of training examples D = {X = {x1, x2, . . . , xN}, Y = {y1, y2, . . . , yN}}, where x and y represent an input and a label, respectively. Given a CNN model with L convolutional layers, let each layer l ∈ 1 . . . L comprise Kl channels, C k l , where k ∈ 1 . . . Kl is the channel index. In each training step t, we 1) sample a batch of B data samples (x 1:B, y 1:B); 2) select and activate a subset S of convolutional channels (see Figure 2); 3) run a forward and a backward pass on the “thin” network, that is only on the active channels; and 4) observe the revealed saliency estimates (SALk l,t) of the activated channels” The “CNN model with L convolutional layers” is considered to be the “ANN model” that is trained using “a set of training examples” which are considered to be the “training data”);
policy training code configured to cause the at least one processor to operate as a Policy model trainer configured to train a Policy model using the Training data (Spasov, Page 2, Section 2, Lines 4-5, “2) select and activate a subset S of convolutional channels (see Figure 2)”; Spasov, Page 4, Algorithm 1, Steps 1-2, 4-11; Spasov, Page 3, Section 2.1, Paragraph 2, Lines 1-3, “Algorithm 1 can be loosely divided in two stages: firstly, an initialization round of exploring the saliencies of the channels in the network, and a second stage where we start exploiting and refining the initial saliency estimates to guide the channel selection procedure” Spasov, Page 3, Section 2.2, Lines 1-2, “Our dynamic channel selection framework requires the estimation of channel saliency, or the contribution of each active channel to the overall network performance” The “channel selection framework” shown in Algorithm 1 is considered to be the “Policy model”; because the “initial saliency estimates” are exploited and refined, the model is considered to be trained using at least part of Algorithm 1).
Spasov does not explicitly teach an Information matrix computation unit configured to compute information matrix of each sample in the training data using training information extracted by the ANN model trainer, nor training the policy model by using, as teacher data, a policy vector which can be determined by comparing a threshold with the information matrix, using the training data and the Information matrix, wherein values included in the policy vector cause at least one layer of the Policy model to skip processing during an inference phase.
Theis teaches an Information matrix computation unit configured to compute information matrix of each sample in the training data using training information extracted by the ANN model trainer (Theis, Page 3, Section 2.1, Lines 1-2, “Our goal is to remove feature maps or parameters which contribute little to the overall performance of the model”; Theis, Page 4, Paragraph 1, Lines 1-4, “For convolutional architectures, it makes sense to try to prune entire feature maps instead of individual parameters, since typical implementations of convolutions may not be able to exploit sparse kernels for speedups. Let
a
n
k
i
j
be the activation of the kth feature map at spatial location i,j for the nth datapoint”; Theis, Page 4, Paragraph 1, Lines 7-9, “The gradient of the loss for the nth datapoint with respect to mk is
g
n
k
=
-
∑
i
j
a
n
k
i
j
∂
∂
a
n
k
i
j
l
o
g
Q
(
z
n
|
I
n
)
(9) and the pruning signal is therefore
∆
k
=
1
2
N
∑
n
g
n
k
2
, since
m
k
2
=
1
before pruning”; Theis, Page 18, Paragraph 3, and Equations 27-32).
Spasov in view of Theis also teaches training the policy model by using, as teacher data, a policy … which can be determined by comparing a threshold with the information matrix (Theis, Page 5, Section 2.3, Paragraph 2, Lines 1-3, and Equation 14, “For a given β, a feature should be pruned if Equation 13 is negative, that is, when doing so reduces the overall cost because it decreases the computational cost more than it increases the cross-entropy: ∆Li + β · ∆Ci ≤ 0 (14)” “0” is considered to be the “threshold”), using the training data and the Information matrix (Spasov, Page 3, Section 2.1, Paragraph 2, Lines 1-3, “Algorithm 1 can be loosely divided in two stages: firstly, an initialization round of exploring the saliencies of the channels in the network, and a second stage where we start exploiting and refining the initial saliency estimates to guide the channel selection procedure” Spasov, Page 3, Section 2.2, Lines 1-2, “Our dynamic channel selection framework requires the estimation of channel saliency, or the contribution of each active channel to the overall network performance”; Theis, Page 4, Paragraph 1, Lines 1-4, “For convolutional architectures, it makes sense to try to prune entire feature maps instead of individual parameters, since typical implementations of convolutions may not be able to exploit sparse kernels for speedups. Let
a
n
k
i
j
be the activation of the kth feature map at spatial location i,j for the nth datapoint”; Theis, Page 4, Paragraph 1, Lines 7-9, “The gradient of the loss for the nth datapoint with respect to mk is
g
n
k
=
-
∑
i
j
a
n
k
i
j
∂
∂
a
n
k
i
j
l
o
g
Q
(
z
n
|
I
n
)
(9) and the pruning signal is therefore
∆
k
=
1
2
N
∑
n
g
n
k
2
, since
m
k
2
=
1
before pruning”; Theis, Page 18, Paragraph 3, and Equations 27-32; Spasov uses the “saliency estimates” to train the “channel selection framework” and Theis teaches using the Fisher Information matrix to estimate a “pruning signal” which is considered to be equivalent to the “saliency estimates”, therefore, using the Fisher Information Matrix method of Theis to estimate the “saliency”, the policy model of Spasov is trained using the information matrix)
It would have been obvious, to a person of ordinary skill in the art, before the effective filing date of the invention to have modified the information processing method of Spasov to use the Fisher information matrix method of kernel ranking across layers as taught by Theis. The motivation for doing so would have been that Spasov presented the method taught by Theis as an alternative choice to the kernel ranking method taught by Spasov (Spasov, Page 3, Section 2.2, Lines 2-6, “We need a saliency metric which enables us to rank the filters of the entire network globally, that is across layers. Molchanov et al. [3] propose a pruning framework which leverages a first-order Taylor approximation for global channel ranking, whereas Theis et al. [25] use Fisher information to achieve kernel ranking across layers [26]. Our channel ranking approach is based on Molchanov et al. [3] although both methods would be applicable”), further, Theis also notes that the method is similar to the one used by Molchanov, but provides a more principled motivation (Theis, Page 4, Paragraph 2, Lines 1-4, “We note that this pruning signal is very similar to the one used by Molchanov et al. [8] – which uses absolute gradients instead of squared gradients and a certain normalization of the pruning signal – but our derivation provides a more principled motivation”).
Theis does not explicitly teach the policy being a vector nor wherein values included in the policy vector cause at least one layer of the Policy model to skip processing during an inference phase.
Luo teaches the policy being a vector and wherein values included in the policy vector cause at least one layer of the Policy model to skip processing during an inference phase (Luo, Page 3, Section 3.1, Paragraph 2, Lines 1-4, “After training, the binary index code is used for filter pruning. All the filters in previous layer and all the channels in the filters of the next layer will be removed if their corresponding index value is 0”; see also Luo, Page 3, Figure 2, “
X
∈
B
C
” is considered to be the “policy vector” which is used in “element-wise multiplication” to result in an output where “some layers are pruned”).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to have modified the information processing method of the proposed combination to include using a policy vector and skipping layers during inference as taught by Luo. The motivation for doing so would have been that the binarization of the vector allows for pruning and fine-tuning to be integrated together (Luo, Page 4, Section 3.1.3, Paragraph 4).
Regarding claim 2, the rejection of claim 1 is incorporated, and further, the proposed combination teaches incremental training code configured to cause the at least one processor to operate as an Incremental ANN model trainer configured to train the ANN model incrementally from the input ANN model with the New training data including pairs of input and output of training and validation for an incremental training phase (Spasov, Page 6, Section 3.3, Line 1, “All network models are trained using SGD with Nesterov momentum of 0.9 without dampening”; Spasov, Page 2, Section 2, Lines 1-6, “We consider a supervised learning problem with a set of training examples D = {X = {x1, x2, . . . , xN}, Y = {y1, y2, . . . , yN}}, where x and y represent an input and a label, respectively. Given a CNN model with L convolutional layers, let each layer l ∈ 1 . . . L comprise Kl channels, C k l , where k ∈ 1 . . . Kl is the channel index. In each training step t, we 1) sample a batch of B data samples (x 1:B, y 1:B); 2) select and activate a subset S of convolutional channels (see Figure 2); 3) run a forward and a backward pass on the “thin” network, that is only on the active channels; and 4) observe the revealed saliency estimates (SALk l,t) of the activated channels” The training is done is “step[s] t” which is considered to be “train[ing] the ANN model incrementally”);
the computation code is further configured to compute the information matrix of each sample in the New training data using the training information (Spasov, Page 3, Section 2.2, Lines 2-5, “We need a saliency metric which enables us to rank the filters of the entire network globally, that is across layers. … Theis et al. [25] use Fisher information to achieve kernel ranking across layers [26]”; Theis, Page 3, Section 2.1, Lines 1-2, “Our goal is to remove feature maps or parameters which contribute little to the overall performance of the model”; Theis, Page 4, Paragraph 1, Lines 1-4, “For convolutional architectures, it makes sense to try to prune entire feature maps instead of individual parameters, since typical implementations of convolutions may not be able to exploit sparse kernels for speedups. Let
a
n
k
i
j
be the activation of the kth feature map at spatial location i,j for the nth datapoint”; Theis, Page 4, Paragraph 1, Lines 7-9, “The gradient of the loss for the nth datapoint with respect to mk is
g
n
k
=
-
∑
i
j
a
n
k
i
j
∂
∂
a
n
k
i
j
l
o
g
Q
(
z
n
|
I
n
)
(9) and the pruning signal is therefore
∆
k
=
1
2
N
∑
n
g
n
k
2
, since
m
k
2
=
1
before pruning”; Theis, Page 18, Paragraph 3, and Equations 27-32) and
incremental policy training code configured to cause the at least one processor to operate as an Incremental policy model trainer configured to train the Policy model incrementally from the input Policy model by using, as teacher data, a policy … which can be determined by comparing a threshold with the information matrix (Theis, Page 5, Section 2.3, Paragraph 2, Lines 1-3, and Equation 14, “For a given β, a feature should be pruned if Equation 13 is negative, that is, when doing so reduces the overall cost because it decreases the computational cost more than it increases the cross-entropy: ∆Li + β · ∆Ci ≤ 0 (14)” “0” is considered to be the “threshold”) using the New training data (Spasov, Page 2, Section 2, Lines 4-5, “2) select and activate a subset S of convolutional channels (see Figure 2)”; Spasov, Page 4, Algorithm 1, Steps 1-2, 4-11; Spasov, Page 3, Section 2.1, Paragraph 2, Lines 1-3, “Algorithm 1 can be loosely divided in two stages: firstly, an initialization round of exploring the saliencies of the channels in the network, and a second stage where we start exploiting and refining the initial saliency estimates to guide the channel selection procedure” Spasov, Page 3, Section 2.2, Lines 1-2, “Our dynamic channel selection framework requires the estimation of channel saliency, or the contribution of each active channel to the overall network performance” The “channel selection framework” shown in Algorithm 1 is considered to be the “Policy model”; because the “initial saliency estimates” are exploited and refined, the model is considered to be trained using at least part of Algorithm 1; Algorithm 1 shows that the policy model is trained iteratively via the “for” loops) and the Information matrix (Spasov, Page 3, Section 2.1, Paragraph 2, Lines 1-3, “Algorithm 1 can be loosely divided in two stages: firstly, an initialization round of exploring the saliencies of the channels in the network, and a second stage where we start exploiting and refining the initial saliency estimates to guide the channel selection procedure” Spasov, Page 3, Section 2.2, Lines 1-2, “Our dynamic channel selection framework requires the estimation of channel saliency, or the contribution of each active channel to the overall network performance”; Theis, Page 4, Paragraph 1, Lines 1-4, “For convolutional architectures, it makes sense to try to prune entire feature maps instead of individual parameters, since typical implementations of convolutions may not be able to exploit sparse kernels for speedups. Let