Last updated: May 29, 2026
Application No. 18/322,373
NEURAL NETWORK TRAINING METHOD AND APPARATUS

Non-Final OA §102
Filed
May 23, 2023
Priority
Nov 23, 2020 — CN 202011322834.6 +1 more
Examiner
LANE, THOMAS BERNARD
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +22.2% interview lift. Examiner has a relatively high allowance rate (85%); +22.2% interview lift. A written response may suffice.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

LANE, THOMAS BERNARD View full profile →
Grants 85% — above average
Career Allowance Rate
11 granted / 13 resolved
+29.6% vs TC avg
Strong +22% interview lift
Without
With
+22.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
6 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
4.1%
-35.9% vs TC avg
§103
91.8%
+51.8% vs TC avg
§102
2.0%
-38.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§102
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119
(a)-(d). The certified copy has been filed in parent Application No. CN202010745395.3, filed on
11/23/2020.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/04/2024 is in compliance
with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being
considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Xiao et al. “Fast Deep Learning Training Through Intelligently Freezing Layers”, 06/17/2019.
Regarding Claim 1 Xiao teaches A neural network training method applied to a neural network training apparatus, the method comprising: obtaining a to-be-trained neural network; (Xiao, page 1230-1231, section IV, teaches the obtaining of different models to be trained by the neural network training apparatus.)
 grouping parameters of the to-be-trained neural network, to obtain M groups of parameters, wherein M is a positive integer greater than or equal to 1; (Xiao, page 1228, section III – A and C, teaches the configuring and use of neural network layers which are groups of weights and parameters used by the neural network to generate its decisions.)
 obtaining sampling probability distribution and training iteration step arrangement, (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution). Further Xiao, page 1229 -1230, section III – A, teaches the obtaining and use of the amount and structure of the epochs used in the training of the neural network models (i.e. training iteration step arrangement).)
wherein the sampling probability distribution represents a probability that each of the M groups of parameters is sampled in each training iteration step, (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution) If the gradients are canceled out that means that the parameters are not being utilized in the epoch (i.e. training iteration step) and it is determined that the layer should be frozen) 
and the training iteration step arrangement comprises interval arrangement and periodic arrangement; (Xiao, page 1229 -1230, section III – A, and B, teaches the obtaining and use of the amount and structure of the epochs used in the training of the neural network models (i.e. training iteration step arrangement). This includes the determining of which layers should be frozen at each epoch.)
 freezing or stopping updating a sampled parameter group based on the sampling probability distribution and the training iteration step arrangement; (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution) If the gradients are canceled out that means that the parameters are not being utilized in the epoch (i.e. training iteration step) and it is determined that the layer should be frozen
 and training the to-be-trained neural network based on the parameter group that is frozen or stopped updating. (Xiao, page 1229 -1230, section III – B, C, and E, teach the freezing of layers (i.e. parameter groups) in a neural network and then continuing training the neural network with those frozen layers)
Regarding Claim 2 Xiao teaches The method according to claim 1, wherein the freezing or stopping updating the sampled parameter group based on the sampling probability distribution and the training iteration step arrangement comprises: determining a first iteration step based on the training iteration step arrangement, wherein the first iteration step is a to-be-sampled iteration step; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch)
 determining, based on the sampling probability distribution, an mth group of parameters sampled in the first iteration step, wherein m is a positive integer less than or equal to M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen.)
 and freezing the mth group of parameters to a first group of parameters in the first iteration step, wherein the freezing the mth group of parameters to a first group of parameters in the first iteration step indicates that gradient calculation and parameter update are not performed on the mth group of parameters to the first group of parameters. ; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen. When a layer is chosen to be frozen it is frozen for subsequent epochs and the gradient calculation and parameter updates are not performed on that layer.)
Regarding Claim 3 Xiao teaches The method according to claim 1, wherein the freezing or stopping updating the sampled parameter group based on the sampling probability distribution and the training iteration step arrangement comprises: determining a first iteration step based on the training iteration step arrangement, wherein the first iteration step is a to-be-sampled iteration step; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch)
 determining, based on the sampling probability distribution, an mth group of parameters sampled in the first iteration step, wherein m is a positive integer less than or equal to M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen.)
 and stopping updating the mth group of parameters to a first group of parameters in the first iteration step, wherein the freezing the mth group of parameters to a first group of parameters in the first iteration step indicates that gradient calculation is performed and parameter update are not performed on the mth group of parameters to the first group of parameters.  (Xiao, page 1229 -1230, section III – B, C, E, and D and Algorithm 1, Fig, $, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be Stopped updating. In figure 4 it is shown that when a layer is frozen if the layer after it is unfrozen the gradient for the frozen layer will still be calculated but the parameters will not be updated (i.e. stopping updating).)
Regarding Claim 4 Xiao teaches The method according to claim 2, wherein when in response to the training iteration step arrangement being the interval arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining a first interval; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch. The number of epochs before the next freezing occurs is determined and set as a hyperparameter in the freezing training algorithm and a first number of epochs is determined (i.e. first interval))
and determining one or more first iteration steps at every first interval in a plurality of training iteration steps. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of a number of epochs before each freeze (i.e. training interval) and an epoch that the freeze will be done on (i.e. iteration step), for every round of training that the model goes though (i.e. training iteration steps).
Regarding Claim 5 Xiao teaches The method according to claim 2, wherein when in response to the training iteration step arrangement being the interval arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining a first interval; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch. The number of epochs before the next freezing occurs is determined and set as a hyperparameter in the freezing training algorithm and a first number of epochs is determined (i.e. first interval))
and determining one or more first iteration steps at every first interval in a plurality of training iteration steps. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of a number of epochs before each freeze (i.e. training interval) and an epoch that the freeze will be done on (i.e. iteration step), for every round of training that the model goes though (i.e. training iteration steps).
Regarding Claim 6 Xiao teaches The method according to claim 2, wherein in response to the training iteration step arrangement being the periodic arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining that a quantity of first iteration steps is M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. iteration step arrangement)))
and determining a first period based on the quantity of first iteration steps and a first proportion, wherein the first period comprises the first iteration step and an iteration step to be trained on the entire network, the first proportion is a proportion of the first iteration step in the first period, and the first iteration step is last (M-1) iteration steps in the first period. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the periodic freezing of layers based on how many epochs have been performed (i.e. iteration step) as the epochs are performed the freezing rate increases for each individual layer and when the set period of epochs is hit the layers indicated by the freezing rate will be frozen or stopped till the ending of the training epochs (i.e. last iteration step in the first period.) )
Regarding Claim 7 Xiao teaches The method according to claim 2, wherein in response to the training iteration step arrangement being the periodic arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining that a quantity of first iteration steps is M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. iteration step arrangement)))
and determining a first period based on the quantity of first iteration steps and a first proportion, wherein the first period comprises the first iteration step and an iteration step to be trained on the entire network, the first proportion is a proportion of the first iteration step in the first period, and the first iteration step is last (M-1) iteration steps in the first period. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the periodic freezing of layers based on how many epochs have been performed (i.e. iteration step) as the epochs are performed the freezing rate increases for each individual layer and when the set period of epochs is hit the layers indicated by the freezing rate will be frozen or stopped till the ending of the training epochs (i.e. last iteration step in the first period.) )
Regarding Claim 8 Xiao teaches A neural network training apparatus, comprising: a memory having computer-executable instructions stored thereon; and a processor configured to execute the computer-executable instructions in the memory to facilitate the following being performed by the apparatus: obtaining a to-be-trained neural network; (Xiao, page 1230-1231, section IV, teaches the obtaining of different models to be trained by the neural network training apparatus.)
 grouping parameters of the to-be-trained neural network, to obtain M groups of parameters, wherein M is a positive integer greater than or equal to 1; (Xiao, page 1228, section III – A and C, teaches the configuring and use of neural network layers which are groups of weights and parameters used by the neural network to generate its decisions.)
 obtaining sampling probability distribution and training iteration step arrangement, (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution). Further Xiao, page 1229 -1230, section III – A, teaches the obtaining and use of the amount and structure of the epochs used in the training of the neural network models (i.e. training iteration step arrangement).)
wherein the sampling probability distribution represents a probability that each of the M groups of parameters is sampled in each training iteration step, (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution) If the gradients are canceled out that means that the parameters are not being utilized in the epoch (i.e. training iteration step) and it is determined that the layer should be frozen) 
and the training iteration step arrangement comprises interval arrangement and periodic arrangement; (Xiao, page 1229 -1230, section III – A, and B, teaches the obtaining and use of the amount and structure of the epochs used in the training of the neural network models (i.e. training iteration step arrangement). This includes the determining of which layers should be frozen at each epoch.)
 freezing or stopping updating a sampled parameter group based on the sampling probability distribution and the training iteration step arrangement; (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution) If the gradients are canceled out that means that the parameters are not being utilized in the epoch (i.e. training iteration step) and it is determined that the layer should be frozen
 and training the to-be-trained neural network based on the parameter group that is frozen or stopped updating. (Xiao, page 1229 -1230, section III – B, C, and E, teach the freezing of layers (i.e. parameter groups) in a neural network and then continuing training the neural network with those frozen layers)
Regarding Claim 9 Xiao teaches The apparatus according to claim 8, wherein the freezing or stopping updating the sampled parameter group based on the sampling probability distribution and the training iteration step arrangement comprises: determining a first iteration step based on the training iteration step arrangement, wherein the first iteration step is a to-be-sampled iteration step; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch)
 determining, based on the sampling probability distribution, an mth group of parameters sampled in the first iteration step, wherein m is a positive integer less than or equal to M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen.)
 and freezing the mth group of parameters to a first group of parameters in the first iteration step, wherein the freezing the mth group of parameters to a first group of parameters in the first iteration step indicates that gradient calculation and parameter update are not performed on the mth group of parameters to the first group of parameters. ; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen. When a layer is chosen to be frozen it is frozen for subsequent epochs and the gradient calculation and parameter updates are not performed on that layer.)
Regarding Claim 10 Xiao teaches The apparatus according to claim 8, wherein the freezing or stopping updating the sampled parameter group based on the sampling probability distribution and the training iteration step arrangement comprises: determining a first iteration step based on the training iteration step arrangement, wherein the first iteration step is a to-be-sampled iteration step; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch)
 determining, based on the sampling probability distribution, an mth group of parameters sampled in the first iteration step, wherein m is a positive integer less than or equal to M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen.)
 and stopping updating the mth group of parameters to a first group of parameters in the first iteration step, wherein the freezing the mth group of parameters to a first group of parameters in the first iteration step indicates that gradient calculation is performed and parameter update are not performed on the mth group of parameters to the first group of parameters.  (Xiao, page 1229 -1230, section III – B, C, E, and D and Algorithm 1, Fig, $, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be Stopped updating. In figure 4 it is shown that when a layer is frozen if the layer after it is unfrozen the gradient for the frozen layer will still be calculated but the parameters will not be updated (i.e. stopping updating).)
Regarding Claim 11 Xiao teaches The apparatus according to claim 9, wherein when in response to the training iteration step arrangement being the interval arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining a first interval; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch. The number of epochs before the next freezing occurs is determined and set as a hyperparameter in the freezing training algorithm and a first number of epochs is determined (i.e. first interval))
and determining one or more first iteration steps at every first interval in a plurality of training iteration steps. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of a number of epochs before each freeze (i.e. training interval) and an epoch that the freeze will be done on (i.e. iteration step), for every round of training that the model goes though (i.e. training iteration steps).
Regarding Claim 12 Xiao teaches The apparatus according to claim 10, wherein when in response to the training iteration step arrangement being the interval arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining a first interval; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch. The number of epochs before the next freezing occurs is determined and set as a hyperparameter in the freezing training algorithm and a first number of epochs is determined (i.e. first interval))
and determining one or more first iteration steps at every first interval in a plurality of training iteration steps. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of a number of epochs before each freeze (i.e. training interval) and an epoch that the freeze will be done on (i.e. iteration step), for every round of training that the model goes though (i.e. training iteration steps).
Regarding Claim 13 Xiao teaches The apparatus according to claim 9, wherein in response to the training iteration step arrangement being the periodic arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining that a quantity of first iteration steps is M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. iteration step arrangement)))
and determining a first period based on the quantity of first iteration steps and a first proportion, wherein the first period comprises the first iteration step and an iteration step to be trained on the entire network, the first proportion is a proportion of the first iteration step in the first period, and the first iteration step is last (M-1) iteration steps in the first period. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the periodic freezing of layers based on how many epochs have been performed (i.e. iteration step) as the epochs are performed the freezing rate increases for each individual layer and when the set period of epochs is hit the layers indicated by the freezing rate will be frozen or stopped till the ending of the training epochs (i.e. last iteration step in the first period.) )
Regarding Claim 14 Xiao teaches The apparatus according to claim 10, wherein in response to the training iteration step arrangement being the periodic arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining that a quantity of first iteration steps is M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. iteration step arrangement)))
and determining a first period based on the quantity of first iteration steps and a first proportion, wherein the first period comprises the first iteration step and an iteration step to be trained on the entire network, the first proportion is a proportion of the first iteration step in the first period, and the first iteration step is last (M-1) iteration steps in the first period. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the periodic freezing of layers based on how many epochs have been performed (i.e. iteration step) as the epochs are performed the freezing rate increases for each individual layer and when the set period of epochs is hit the layers indicated by the freezing rate will be frozen or stopped till the ending of the training epochs (i.e. last iteration step in the first period.) )
Regarding Claim 15 Xiao teaches A non-transitory computer-readable storage medium, wherein the computer-readable medium stores program code executed by a device, and the program code, upon execution the device, facilitating performance of the following: obtaining a to-be-trained neural network; (Xiao, page 1230-1231, section IV, teaches the obtaining of different models to be trained by the neural network training apparatus.)
 grouping parameters of the to-be-trained neural network, to obtain M groups of parameters, wherein M is a positive integer greater than or equal to 1; (Xiao, page 1228, section III – A and C, teaches the configuring and use of neural network layers which are groups of weights and parameters used by the neural network to generate its decisions.)
 obtaining sampling probability distribution and training iteration step arrangement, (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution). Further Xiao, page 1229 -1230, section III – A, teaches the obtaining and use of the amount and structure of the epochs used in the training of the neural network models (i.e. training iteration step arrangement).)
wherein the sampling probability distribution represents a probability that each of the M groups of parameters is sampled in each training iteration step, (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution) If the gradients are canceled out that means that the parameters are not being utilized in the epoch (i.e. training iteration step) and it is determined that the layer should be frozen) 
and the training iteration step arrangement comprises interval arrangement and periodic arrangement; (Xiao, page 1229 -1230, section III – A, and B, teaches the obtaining and use of the amount and structure of the epochs used in the training of the neural network models (i.e. training iteration step arrangement). This includes the determining of which layers should be frozen at each epoch.)
 freezing or stopping updating a sampled parameter group based on the sampling probability distribution and the training iteration step arrangement; (Xiao, page 1229 -1230, section III – B, C, and E, teach the calculating of a freezing rate that determines if the layers (i.e groups of parameters) gradients are likely to be canceled out and if they should be frozen at each epoch (i.e. sampling probability distribution) If the gradients are canceled out that means that the parameters are not being utilized in the epoch (i.e. training iteration step) and it is determined that the layer should be frozen
 and training the to-be-trained neural network based on the parameter group that is frozen or stopped updating. (Xiao, page 1229 -1230, section III – B, C, and E, teach the freezing of layers (i.e. parameter groups) in a neural network and then continuing training the neural network with those frozen layers)
Regarding Claim 16 Xiao teaches The medium according to claim 15, wherein the freezing or stopping updating the sampled parameter group based on the sampling probability distribution and the training iteration step arrangement comprises: determining a first iteration step based on the training iteration step arrangement, wherein the first iteration step is a to-be-sampled iteration step; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch)
 determining, based on the sampling probability distribution, an mth group of parameters sampled in the first iteration step, wherein m is a positive integer less than or equal to M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen.)
 and freezing the mth group of parameters to a first group of parameters in the first iteration step, wherein the freezing the mth group of parameters to a first group of parameters in the first iteration step indicates that gradient calculation and parameter update are not performed on the mth group of parameters to the first group of parameters. ; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen. When a layer is chosen to be frozen it is frozen for subsequent epochs and the gradient calculation and parameter updates are not performed on that layer.)
Regarding Claim 17 Xiao teaches The medium according to claim 15, wherein the freezing or stopping updating the sampled parameter group based on the sampling probability distribution and the training iteration step arrangement comprises: determining a first iteration step based on the training iteration step arrangement, wherein the first iteration step is a to-be-sampled iteration step; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch)
 determining, based on the sampling probability distribution, an mth group of parameters sampled in the first iteration step, wherein m is a positive integer less than or equal to M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be frozen.)
 and stopping updating the mth group of parameters to a first group of parameters in the first iteration step, wherein the freezing the mth group of parameters to a first group of parameters in the first iteration step indicates that gradient calculation is performed and parameter update are not performed on the mth group of parameters to the first group of parameters.  (Xiao, page 1229 -1230, section III – B, C, E, and D and Algorithm 1, Fig, $, teaches the analyzing of the layers and using a layer freezing rate for the current epoch (i.e. sample probability distribution) to determine if a layer (i.e. mth group of parameters) should be Stopped updating. In figure 4 it is shown that when a layer is frozen if the layer after it is unfrozen the gradient for the frozen layer will still be calculated but the parameters will not be updated (i.e. stopping updating).)
Regarding Claim 18 Xiao teaches The medium according to claim 16, wherein when in response to the training iteration step arrangement being the interval arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining a first interval; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch. The number of epochs before the next freezing occurs is determined and set as a hyperparameter in the freezing training algorithm and a first number of epochs is determined (i.e. first interval))
and determining one or more first iteration steps at every first interval in a plurality of training iteration steps. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of a number of epochs before each freeze (i.e. training interval) and an epoch that the freeze will be done on (i.e. iteration step), for every round of training that the model goes though (i.e. training iteration steps).
Regarding Claim 19 Xiao teaches The medium according to claim 17, wherein when in response to the training iteration step arrangement being the interval arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining a first interval; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. training iteration step arrangement) the first epoch is taken by the algorithm to determine what layers are to be frozen for the next epoch. The number of epochs before the next freezing occurs is determined and set as a hyperparameter in the freezing training algorithm and a first number of epochs is determined (i.e. first interval))
and determining one or more first iteration steps at every first interval in a plurality of training iteration steps. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of a number of epochs before each freeze (i.e. training interval) and an epoch that the freeze will be done on (i.e. iteration step), for every round of training that the model goes though (i.e. training iteration steps).
Regarding Claim 20 Xiao teaches The medium according to claim 16, wherein in response to the training iteration step arrangement being the periodic arrangement, the determining the first iteration step based on the training iteration step arrangement comprises: determining that a quantity of first iteration steps is M-1; (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the use of epochs and a hyperparameter determine how many epochs will be done (i.e. iteration step arrangement)))
and determining a first period based on the quantity of first iteration steps and a first proportion, wherein the first period comprises the first iteration step and an iteration step to be trained on the entire network, the first proportion is a proportion of the first iteration step in the first period, and the first iteration step is last (M-1) iteration steps in the first period. (Xiao, page 1229 -1230, section III – B, C, and E, and Algorithm 1, teaches the periodic freezing of layers based on how many epochs have been performed (i.e. iteration step) as the epochs are performed the freezing rate increases for each individual layer and when the set period of epochs is hit the layers indicated by the freezing rate will be frozen or stopped till the ending of the training epochs (i.e. last iteration step in the first period.) )

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's
disclosure. by Isikdogan et al. US-20200293870-A1 and Tan et al. US-20210232909-A1. Both Isikdogan and Tan describe methods of training neural networks that involve freezing parameters and layers of the neural network during different periods of the training.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS B LANE whose telephone number is (571)272-1872. The examiner can normally be reached M-Th: 7am-5pm; F: Out of Office.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MARIELA REYES can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THOMAS BERNARD LANE/Examiner, Art Unit 2142                                                                                                                                                                                                        

/HAIMEI JIANG/Primary Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

May 23, 2023
Application Filed
Sep 29, 2023
Response after Non-Final Action
Mar 09, 2026
Non-Final Rejection mailed — §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/939,778
Patent 12619893
GENERATION APPARATUS, GENERATION METHOD, AND RECORDING MEDIUM
3y 8m to grant Granted May 05, 2026
18/031,065
Patent 12619857
Methods and Apparatuses for Bottleneck Stages in Neural-Network Processing
3y 0m to grant Granted May 05, 2026
17/536,230
Patent 12561398
VALIDATION PROCESSING FOR CANDIDATE RETRAINING DATA
4y 2m to grant Granted Feb 24, 2026
17/457,698
Patent 12541572
ACCELERATING DECISION TREE INFERENCES BASED ON COMPLEMENTARY TENSOR OPERATION SETS
4y 2m to grant Granted Feb 03, 2026
17/838,342
Patent 12468921
PIPELINING AND PARALLELIZING GRAPH EXECUTION METHOD FOR NEURAL NETWORK MODEL COMPUTATION AND APPARATUS THEREOF
3y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+22.2%)
3y 9m (~9m remaining)
Median Time to Grant
Low
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allowance rate.