Office Action Analysis: 18119221 — METHOD AND SYSTEM FOR TRAINING A NEURAL NETWORK MODEL USING GRADUAL KNOWLEDGE DISTILLATION

Office Action

§101 §103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Objections Claim 1, 11, and 20 objected to because of the following informalities: Claims 1, 11 and 20 recites: wherein the soothing factor is adjusted over the plurality of first training phase epochs to reduce a smoothing effect on the generated smoothed TNN model outputs; The term “soothing” appears in the specification and in each of the independent claims; however, the specification does not describe, define, or otherwise provide written description support for “soothing” factor or operation. The surrounding disclosure instead consistently discusses loss functions and smoothing-type techniques. According ly , “soothing” will be treated as an apparent typographical error for “smoothing.” Appropriate correction is required. Claim 11 and 12 objected to because of the following informalities: Claim 11 appears to have been structured in a manner similar to claim 1; however, as presented, claim 11 appears to be missing two limitations that are instead recited in claim 12. This suggests that a formatting error may have occurred when structuring claim 11, resulting in certain limitations being separated into claim 12. Appropriate correction is required. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. 101 Subject Matter Eligibility Analysis Step 1: Claims 1-20 are within the four statutory categories (a process, machine, manufacture or composition of matter). Claims 1- 10 are directed to a method consisting of a series of steps, meaning that it is directed to the statutory category of process. Claims 11 -20 are directed to storage mediums and processors which are machines . Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis: Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101. None of the claims represent an improvement to technology. Regarding claim 1, the following claim elements are abstract ideas: computing SNN model outputs for the plurality of input data samples (This is an abstract idea of a mathematical concept and mental process. The limitation recites computing model outputs by applying mathematical operations to input data samples according to defined rules or parameters. Such calculations are mathematical in nature and involve applying formulas to inputs to obtain corresponding outputs, which can be carried out in the human mind or with basic computing tools such as a calculator and pen and paper. Accordingly, this limitation falls within the mathematical concepts and mental processes groupings of abstract ideas. See MPEP 2106.04(a)(2)(I) and 2106.04(a)(2)(III). applying a smoothing factor to the teacher neural network (TNN) model outputs to generate smoothed TNN model outputs (This is an abstract idea of a mathematical concept and mental process. The limitation recites applying a smoothing factor to model outputs, which involve mathematical calculations such as scaling, averaging, or weighting numerical values according to a defined factor. Such calculations are mathematical in nature and involve applying formulas to numerical outputs, which can be carried out in the human mind or with the aid of basic computing tolls such as a calculator and with pen and paper.) ; computing a first loss based on the SNN model outputs and the smoothed TNN model outputs (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing a loss value based on comparing model outputs, which involves mathematical calculations such as differences, distances, or error functions applied to numerical values. Such calculations are mathematical in nature and involve applying formulas to the outputs to obtain a loss value, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and with pen and paper.) ; and computing an updated set of the SNN model parameters with an objective of reducing the first loss in a following first training phase epoch (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing updated model parameters based on an objective of reducing a loss value, which involves mathematical calculations such as adjusting numerical values according to defined formulas or rules to minimize error measure. Such calculations are mathematical in nature and involve applying mathematical relationships to update parameters, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator with pen and paper.) , wherein the soothing factor is adjusted over the plurality of first training phase epochs to reduce a smoothing effect on the generated smoothed TNN model outputs (This is an abstract idea of a mathematical concept and mental process. The limitation recites adjusting a numerical factor over training epochs based on its effect on model outputs, which involves mathematical calculations such as modifying a parameter value to change the degree of smoothing applied to numerical outputs. Such calculations are mathematical in nature and involve evaluating and adjusting numerical values according to the desired outcome, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and pen and paper.) ; computing SNN model outputs for the plurality of input data samples from the SNN model (This is abstract idea of mathematical concepts and a mental process. The limitation recites computing model outputs by applying mathematical operations to input data samples according to defined rules or parameters. Such calculations are mathematical in nature and involve applying formulas to numerical inputs to obtain corresponding outputs, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and pen and paper.) ; computing a second loss based on the SNN model outputs and a set of predefined expected outputs for the plurality of input data samples (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing a loss value by comparing model outputs to predefined expected outputs, which involves mathematical calculations such as differences, error measures, or distance functions applied to numerical values. Such calculations are mathematical in nature and involve applying formulas to the outputs and expected values to obtain a loss value, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and with pen and paper.) ; and computing an updated set of the SNN model parameters with an objective of reducing the second loss in a following second training phase (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing updated model parameters based on an objective of reducing a loss value, which involves mathematical calculations such as adjusting numerical parameter values according to defined formulas or rules to minimize an error measure. Such calculations are mathematical in nature and involve applying mathematical relationships to update parameters, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator or with pen and paper.) , selecting a final set of SNN model parameters from the updated sets of SNN model parameters computed during the second training phase (This is an abstract idea of a mental process. The limitation recites selecting one set of parameters from among multiple updated sets based on judgement or comparison. Such selection involves evaluation and decision-making that can be performed in the human mind, for example by reviewing candidate parameters sets and choosing one set according to a criterion. Accordingly, this limitation falls within the mental process grouping of abstract ideas.) . The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: student neural network (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.) (SNN) model that is configured by a set of SNN model parameters to generate outputs in respect of input data samples (This limitation merely recites applying the abstract idea using a generic neural network model and does not add a meaningful limitation. See MPEP 2106.05(f).) , obtaining respective teacher neural network (TNN) model outputs for a plurality of input data samples (The limitation amounts to a generic data operation of obtaining stored outputs, i.e., storing and retrieving information from memory, which has been recognized by the courts as a well-understood routine, and conventional activity. See MPEP 2105.04(d)(II)(iv).) ; performing a first training phase of the SNN model, the first training phase comprising training the SNN model over a plurality of first training phase epochs, each first training phase epoch comprising (This limitation merely recites training a model over multiple iterations, which is an instruction to apply the abstract idea and does not provide meaningful limitation. See MPEP 2106.05(f).) performing a second training phase of the SNN model, the second training phase comprising initializing the SNN model with a set of SNN model parameters selected from the updated sets of SNN model parameters computed during the first training phase, the second training phase of the SNN model being performed over a plurality of second training phase epochs, each second training phase epoch comprising (This limitation recites initializing and training a model over multiple iterations, which is an instruction to apply the abstract idea and does not provide a meaningful limitation). Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following abstract ideas: wherein in each epoch of the first training phase the smoothing factor is computed as smoothing factor = t t max , where t max is a constant and a value of t is incremented in each subsequent first training phase epoch (This is an abstract idea of mathematical concepts. The limitation recites computing a smoothing factor using a mathematical formula that involves division of a variable by a constant and incrementing a variable across epochs. This limitation is directed to a mathematical relationship and calculation, independent of any particular technological implementation.) . Regarding claim 3 , the rejection of claim 1 is incorporated herein. Further, claim 3 recites the following abstract ideas: wherein the first loss corresponds to a divergence between the SNN model outputs and the smoothed TNN model outputs (This is an abstract idea of mathematical concepts. The limitation recites defining a loss as a divergence between two sets of numerical outputs, which is a mathematical relationship used to quantify differences between values. Such divergences are mathematical constructs expressed through formulas and calculations.) . Regarding claim 4 , the rejection of claim 3 is incorporated herein. Further, claim 4 recites the following abstract ideas: wherein the first loss corresponds to a Kullback-Leibler divergence between the SNN model outputs and the smoothed TNN model outputs (This is an abstract idea of mathematical concepts. The limitation recites a Kullback-Leibler divergence, which is a specific mathematical formula used to measure divergence between probability distributions. As such, this limitation is direct to a mathematical relationship and calculation.) . Regarding claim 5 , the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following abstract ideas: wherein the second loss corresponds to a divergence between the SNN model outputs and the set of predefined expected outputs (This is an abstract idea of mathematical concepts and a mental process. The limitation recites defining a loss as a divergence between two sets of numerical values, which involves mathematical calculations the quantity differences between outputs and expected values. Such calculations are mathematical in nature and involve comparing numerical values according to defined formulas, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and pen and paper.) Regarding claim 6 , the rejection of claim 5 is incorporated herein. Further, claim 6 recites the following abstract ideas: wherein the second loss is computed based on a cross entropy loss function (This is an abstract idea of mathematical concepts. The limitation recites computing a loss using cross entropy loss function, which is a mathematical formula used to quantify differences between predicted values and expected values.) . Regarding claim 7 , the rejection of claim 1 is incorporated herein. Further, claim 7 recites the following abstract ideas: for each first training phase epoch, determining if the computed updated set of the SNN model parameters improves a performance of the SNN model relative to updated sets of SNN model parameters previously computed during the first training phase in respect 30 of a development dataset that includes a set of development data samples and respective expected outputs (This is an abstract idea of a mental process. The limitation recites evaluating and comparing model performance across different parameter sets to determine whether performance has improved. Such evaluation involves judgement and comparison of results against expected outputs, which can be performed in the human mind. Accordingly, this limitation falls within the mental process grouping of abstract ideas.) The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: when the computed updated set of the SNN model parameters does improve the performance, update the SNN model parameters to the computed updated set of the SNN model parameters prior to a next first training phase epoch (The step of “updating the SNN model parameters” merely recites updating stored parameter values based on a determination and amounts to an instruction to apply the abstract idea, without adding meaningful limitation.) Regarding claim 8 , the rejection of claim 7 is incorporated herein. Further, claim 8 recites the following additional elements, which taken alone or in combination with other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: when the computed updated set of the SNN model parameters does improve the performance, update the SNN model parameters to the computed updated set of the SNN model parameters prior to a next first training phase epoch (This limitation merely recites using a particular set of SNN model parameters to initialize a training phase, which amounts to an instruction to apply the abstract idea and does not provide a meaningful limitation.) . Regarding claim 9 , the rejection of claim 7 is incorporated herein. Further, claim 9 recites the following abstract ideas: for each second training phase epoch, determining if the computed updated set of the SNN model parameters improves a performance of the SNN model relative to updated sets of SNN model parameters previously computed during the second training phase in respect of the development dataset (This is an abstract idea of a mental process. The limitation recites determining whether performance has improved by comparing results associated with different parameter sets. Such comparison and evaluation involves judgement based on observed outcomes and expected outputs, which can be performed in the human mind or with the aid of basic computing tools such as a calculator and pen and paper.) , The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: when the computed updated set of the SNN model parameters does improve the performance, update the SNN model parameters to the computed updated set of the SNN model parameters prior to a next epoch (The step of “updating the SNN model parameters” merely recites updating store parameter values and amounts to an instruction to apply the abstract idea, without providing meaningful limitation.) Regarding claim 10 , the rejection of claim 9 is incorporated herein. Further, claim 10 recites the following additional elements, which taken alone or in combination with other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: wherein the final set of SNN model is the updated set of SNN model parameters computed during the second training phase that best improves the performance of the SNN model during the second training phase (This limitation merely recites using a particular set of model parameters as final parameters, which amounts to an instruction to apply the abstract idea and does not provide a meaningful limitation.) . Regarding claim 1 1 , the following claim elements are abstract ideas: computing SNN model outputs for the plurality of input data samples (This is an abstract idea of a mathematical concept and mental process. The limitation recites computing model outputs by applying mathematical operations to input data samples according to defined rules or parameters. Such calculations are mathematical in nature and involve applying formulas to inputs to obtain corresponding outputs, which can be carried out in the human mind or with basic computing tools such as a calculator and pen and paper. Accordingly, this limitation falls within the mathematical concepts and mental processes groupings of abstract ideas. See MPEP 2106.04(a)(2)(I) and 2106.04(a)(2)(III). applying a smoothing factor to the teacher neural network (TNN) model outputs to generate smoothed TNN model outputs (This is an abstract idea of a mathematical concept and mental process. The limitation recites applying a smoothing factor to model outputs, which involve mathematical calculations such as scaling, averaging, or weighting numerical values according to a defined factor. Such calculations are mathematical in nature and involve applying formulas to numerical outputs, which can be carried out in the human mind or with the aid of basic computing tolls such as a calculator and with pen and paper.) ; computing a first loss based on the SNN model outputs and the smoothed TNN model outputs (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing a loss value based on comparing model outputs, which involves mathematical calculations such as differences, distances, or error functions applied to numerical values. Such calculations are mathematical in nature and involve applying formulas to the outputs to obtain a loss value, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and with pen and paper.) ; and computing an updated set of the SNN model parameters with an objective of reducing the first loss in a following first training phase epoch (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing updated model parameters based on an objective of reducing a loss value, which involves mathematical calculations such as adjusting numerical values according to defined formulas or rules to minimize error measure. Such calculations are mathematical in nature and involve applying mathematical relationships to update parameters, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator with pen and paper.) , wherein the soothing factor is adjusted over the plurality of first training phase epochs to reduce a smoothing effect on the generated smoothed TNN model outputs (This is an abstract idea of a mathematical concept and mental process. The limitation recites adjusting a numerical factor over training epochs based on its effect on model outputs, which involves mathematical calculations such as modifying a parameter value to change the degree of smoothing applied to numerical outputs. Such calculations are mathematical in nature and involve evaluating and adjusting numerical values according to the desired outcome, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and pen and paper.) ; computing SNN model outputs for the plurality of input data samples from the SNN model (This is abstract idea of mathematical concepts and a mental process. The limitation recites computing model outputs by applying mathematical operations to input data samples according to defined rules or parameters. Such calculations are mathematical in nature and involve applying formulas to numerical inputs to obtain corresponding outputs, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and pen and paper.) ; computing a second loss based on the SNN model outputs and a set of predefined expected outputs for the plurality of input data samples (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing a loss value by comparing model outputs to predefined expected outputs, which involves mathematical calculations such as differences, error measures, or distance functions applied to numerical values. Such calculations are mathematical in nature and involve applying formulas to the outputs and expected values to obtain a loss value, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator and with pen and paper.) ; and The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: student neural network (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.) one or more processers and a non-transitory storage medium storing software instructions that (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.) , obtaining respective teacher neural network (TNN) model outputs for a plurality of input data samples (The limitation amounts to a generic data operation of obtaining stored outputs, i.e., storing and retrieving information from memory, which has been recognized by the courts as a well-understood routine, and conventional activity. See MPEP 2105.04(d)(II)(iv).) ; performing a first training phase of the SNN model, the first training phase comprising training the SNN model over a plurality of first training phase epochs, each first training phase epoch comprising (This limitation merely recites training a model over multiple iterations, which is an instruction to apply the abstract idea and does not provide meaningful limitation. See MPEP 2106.05(f).) performing a second training phase of the SNN model, the second training phase comprising initializing the SNN model with a set of SNN model parameters selected from the updated sets of SNN model parameters computed during the first training phase, the second training phase of the SNN model being performed over a plurality of second training phase epochs, each second training phase epoch comprising (This limitation recites initializing and training a model over multiple iterations, which is an instruction to apply the abstract idea and does not provide a meaningful limitation). Regarding claim 12, the rejection of claim 11 is incorporated herein. Claim 12 further recites the following abstract ideas: computing an updated set of the SNN model parameters with an objective of reducing the second loss in a following second training phase (This is an abstract idea of a mathematical concept and a mental process. The limitation recites computing updated model parameters based on an objective of reducing a loss value, which involves mathematical calculations such as adjusting numerical parameter values according to defined formulas or rules to minimize an error measure. Such calculations are mathematical in nature and involve applying mathematical relationships to update parameters, which can be carried out in the human mind or with the aid of basic computing tools such as a calculator or with pen and paper.) , selecting a final set of SNN model parameters from the updated sets of SNN model parameters computed during the second training phase (This is an abstract idea of a mental process. The limitation recites selecting one set of parameters from among multiple updated sets based on judgement or comparison. Such selection involves evaluation and decision-making that can be performed in the human mind, for example by reviewing candidate parameters sets and choosing one set according to a criterion. Accordingly, this limitation falls within the mental process grouping of abstract ideas.) . wherein in each epoch of the first training phase the smoothing factor is computed as smoothing factor = t t max , where t max is a constant and a value of t is incremented in each subsequent first training phase epoch (This is an abstract idea of mathematical concepts. The limitation recites computing a smoothing factor using a mathematical formula that involves division of a variable by a constant and incrementing a variable across epochs. This limitation is directed to a mathematical relationship and calculation, independent of any particular technological implementation.) . Regarding claim 13, the rejection of claim 11 is incorporated herein. The claim recites similar limitations corresponding to claim 3. Therefore, the same subject matter analysis that was utilized for claim 3, as described above, is equally applicable to claim 13. Regarding claim 1 4 , the rejection of claim 1 3 is incorporated herein. The claim recites similar limitations corresponding to claim 4 . Therefore, the same subject matter analysis that was utilized for claim 4 , as described above, is equally applicable to claim 1 4 . Regarding claim 1 5 , the rejection of claim 11 is incorporated herein. The claim recites similar limitations corresponding to claim 5. Therefore, the same subject matter analysis that was utilized for claim 5, as described above, is equally applicable to claim 15. Regarding claim 1 6 , the rejection of claim 15 is incorporated herein. The claim recites similar limitations corresponding to claim 6. Therefore, the same subject matter analysis that was utilized for claim 6, as described above, is equally applicable to claim 16. Regarding claim 1 7 , the rejection of claim 11 is incorporated herein. The claim recites similar limitations corresponding to claim 7. Therefore, the same subject matter analysis that was utilized for claim 7, as described above, is equally applicable to claim 17. Regarding claim 1 8 , the rejection of claim 17 is incorporated herein. The claim recites similar limitations corresponding to claim 8. Therefore, the same subject matter analysis that was utilized for claim 8, as described above, is equally applicable to claim 18. Regarding claim 1 9 , the rejection of claim 17 is incorporated herein. Claim 17 further recites the following abstract ideas: for each second training phase epoch, determining if the computed updated set of the SNN model parameters improves a performance of the SNN model relative to updated sets of SNN model parameters previously computed during the second training phase in respect of the development dataset (This is an abstract idea of a mental process. The limitation recites determining whether performance has improved by comparing results associated with different parameter sets. Such comparison and evaluation involves judgement based on observed outcomes and expected outputs, which can be performed in the human mind or with the aid of basic computing tools such as a calculator and pen and paper.) , The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: when the computed updated set of the SNN model parameters does improve the performance, update the SNN model parameters to the computed updated set of the SNN model parameters prior to a next epoch (The step of “updating the SNN model parameters” merely recites updating store parameter values and amounts to an instruction to apply the abstract idea, without providing meaningful limitation.) wherein the final set of SNN model is the updated set of SNN model parameters computed during the second training phase that best improves the performance of the SNN model during the second training phase (This limitation merely recites using a particular set of model parameters as final parameters, which amounts to an instruction to apply the abstract idea and does not provide a meaningful limitation.) . Regarding claim 20 , claim 20 recites similar method steps corresponding to claim 1, implemented in the form of a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, perform the recited steps. Accordingly, the same subject matter analysis applied to claim 1, as described above, is equally applicable to claim 20, and claim 20 is rejected for similar reasons. The recited non-transitory computer-readable medium and one or more processors merely constitute generic computer components for carrying out the recited method steps and do not amount to anything significantly more. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-20 are rejected under the 35 U.S.C. 103 as being unpatentable over Yuan et al., ( NPL .: “Revisit Knowledge Distillation: A Teacher-Free Framework (Published: July 2020 ) ) in view of Kim ( Pub. No. : US 20200125927 A1 ( Filed: 2018 )). Regarding claim 1, Yuan teaches the following limitations: A method of training a student neural network (SNN) model that is configured by a set of SNN model parameters to generate outputs in respect of input data samples (Yuan, [Abstract] “ Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. ” And [Introduction] “ We therefore argue that the similarity information between categories cannot fully explain the dark knowledge in KD, and the soft targets from the teacher model indeed provide effective regularization for the student model, which are equally or even more important. ” – training a student model by using soft targets from the teacher model to regularize the student model.) , comprising obtaining respective teacher neural network (TNN) model outputs for a plurality of input data samples ( Yuan, [Introduction] “ the dark knowledge in KD, and the soft targets from the teacher model indeed provide effective regularization for the student model, which are equally or even more important ” – because soft targets are the outputs generated by the teacher model for training inputs. Using soft targets necessarily requires obtaining the teacher model’s outputs corresponding to those input data samples so they can be applied during training of the student model.) ; performing a first training phase of the SNN model, the first training phase comprising training the SNN model over a plurality of first training phase epochs, each first training phase epoch comprising: computing SNN model outputs for the plurality of input data samples (Yuan, [page 5, section 3] “ Given a neural network S to train, we first give loss function of LSR for S . For each training example x, S outputs the probability of each label k … :p k x =softmax( z k ) ” – discloses that the student neural network computes output values for each input data sample by mathematically evaluating a softmax function to produce label probabilities, which direct corresponds to computing SNN model outputs for a plurality of input data samples during training.) applying a smoothing factor to the teacher neural network (TNN) model outputs to generate smoothed TNN model outputs (Yuan, [page 6, section 3] “ the output prediction of the teacher network is p τ t k =softmax( z k t ) … where z t is the output logits of the teacher network and τ is the temperature to soften p t (k) (written as p τ t k after softened).” – teaches applying a smoothing factor to the teacher network outputs, which directly corresponds to generating smoothed TNN model outputs.) ; computing a first loss based on the SNN model outputs and the smoothed TNN model outputs ( Yuan, [page 6, section 3] “ The idea behind knowledge distillation is to let the student (the model S) mimic the teacher by minimizing the cross-entropy loss and KL divergence between the predictions of student and teacher as L KD = 1-α H q,p + α D KL ( p τ t , p r ) – defines a loss function that is computed using the student model outputs p and the smoothed teacher model outputs, where the smoothing is applied via the temperature parameter τ .) ; and computing an updated set of the SNN model parameters with an objective of reducing the first loss in a following first training phase epoch ( Yuan, [ page 5, section 3] “ Given a neural network S to train, we first give loss function of LSR for S . For each training example x, S outputs the probability of each label k∈ 1…K :p k x =softmax ( z k ) , where z i is the logit of the neural network S … The model S can be trained by minimizing the cross-entropy loss: H q,p =- k-1 K q k log p k . ” – discloses training the student neural network S by minimizing a defined loss function. It is inherent in minimizing a loss during training that the parameters of the student neural network are updated so that the loss is reduced in subsequent training epochs. ) , performing a second training phase of the SNN model, the second training phase comprising initializing the SNN model with a set of SNN model parameters selected from the updated sets of SNN model parameters computed during the first training phase, the second training phase of the SNN model being performed over a plurality of second training phase epochs, each second training phase epoch comprising (Yuan, [page 7, section 4] “ Specifically, we first train the student model in the normal way to obtain a pre-trained model, which is then used as the teacher to train itself by transferring soft targets as in Eq. (4). ” – discloses that the student neural network is first trained to obtain a pre-trained set of model parameters, and that those learned parameters are then reused to initialize a subsequent training process in which the model is further trained.) computing SNN model outputs for the plurality of input data samples from the SNN model (Yuan, [page 7, section 4] “ where p, p τ t are the output probability of S and S t respectively ” – the student model S produces output probabilities p, which are computed by evaluating the SNN on input data samples.) ; computing a second loss based on the SNN model outputs and a set of predefined expected outputs for the plurality of input data samples ; and computing an updated set of the SNN model parameters with an objective of reducing the second loss in a following second training phase (Yuan, [page 7, section 4] “ The loss function of Tf- KD self to train model S is L self = 1-α H q,p +α D KL p τ t , p τ , where p, p τ t are the output probability of S and S t respectively, τ is the temperature and α is the weight to balance the two terms.” – discloses computing a loss function using p, which represents the outputs of the student neural network, and q, which represents predefined expected outputs (ground-truth labels) in the cross-entropy term H(q,p). Accordingly, this disclosure teaches computing a second loss based on the SNN model outputs and a set of predefined expected outputs for a plurality of input data samples.) computing an updated set of the SNN model parameters with an objective of reducing the second loss in a following second training phase (Yuan, [page 7, section 4]“ then we try to minimize the KL divergence … The loss function of Tf- KD self to train model S is L self = 1-α H q,p +α D KL p τ t , p τ ” – discloses training the student neural network by minimizing a defined loss function. Training a neural network by minimizing a loss function inherently requires computing updated model parameters to reduce the loss, since the loss depends on model outputs, which in turn depend on model parameters. Accordingly, the disclosure inherently teaches computing an updated set of SNN model parameters with an objective of reducing the loss in a following training phase epoch.) selecting a final set of SNN model parameters from the updated sets of SNN model parameters computed during the second training phase (Yuan, [page 8, section 5.1] “ Fig. 4 (a) shows the test accuracy of the six models. It can be seen that our Tf- KD self consistently outperforms the baselines. For example, as a powerful model with 34.52M parameters, ResNeXt29 improves itself by 1.05% with self-training (Fig. 4(b)). ” – discloses evaluating the student model after completion of the Tf- KD self self-training process. Reporting “test accuracy” values and quantified performance improvements inherently requires that training updates have ceased and that a final set of SNN model parameters is fixed and used for an inference on a test dataset. Under BRI, the act of evaluating and reporting a final test accuracy necessarily involves selecting a final set of SNN model parameters from among the updated parameter sets computed during the second training phase. The explicit attribution to these results of “self-training” identifies the selected parameters as being computed during the second training phase. ) . However, Yuan does not teach but Yuan in view of Kim teaches the following limitation: wherein the s m oothing factor is adjusted over the plurality of first training phase epochs to reduce a smoothing effect on the generated smoothed TNN model outputs (Yuan, [page 2, introduction] “ For KD, by combining the teacher’s soft targets with the one-hot ground truth label, we find that KD is a learned LSR where the smoothing distribution of KD is from a teacher model but the smoothing distribution of LSR is manually designed ” Kim paragraph [0012] “ The determining of the loss function may include determining the loss function by applying a first factor to the error rate between the recognition result of the teacher model and the recognition result of the student model, wherein the first factor may be controlled so that a contribution of the teacher model to training of the student model decreases in response to an increase in a training epoch of the student model. ” – Yuan teaches that knowledge distillation is a form of learned label smoothing in which the smoothing distribution is derived from teacher model outputs. Kim teaches applying a factor to the teacher-student loss term and controlling the factor so that the contribution of the teacher model decreases in response to an increase in training epochs. Under BRI, the claimed “smoothing factor” reads on Kim’s epoch-controlled factor applied to the teacher-derived term. As the teacher’s contribution decreases, the influence of the teacher derived smoothing distribution necessarily decreases, thereby reducing the smoothing effected on the generated smoothed outputs. ) ; Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Yuan and Kim before them, to adjust a factor applied to teacher model outputs during knowledge distillation during training such that the contribution to the teacher model decreases over training epochs. Yuan teaches that knowledge distillation corresponds to a form of label smoothing in which a teacher model and its outputs provide a smoothing distribution for training a student model. Kim teaches applying a factor to a loss associated with teacher-student outputs and controlling that factor so that the contribution rate of the teacher model to training the student model is decreased as training progresses, particularly when a low error rate indicates that the student model has sufficiently learned from the teacher. One would have been motivated to make such a combination in order to reduce reliance on teacher-provided outputs once the student model is sufficiently trained, thereby allowing the student model to transition toward learning from ground-truth targets and increasing completion and convergence of the training process (Kim, paragraph [0070]). Regarding claim 2, Yuan in view of Kim teaches all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1. Yuan in view of Kim further teaches: wherein in each epoch of the first training phase the smoothing factor is computed as smoothing factor = t t max , where t max is a constant and a value of t is incremented in each subsequent first training phase epoch (Kim, paragraph [0093] “ In operation 740 , the model training apparatus determines whether a training epoch t is less than a maximum epoch. For example, when the training epoch t is determined to be less than the maximum epoch, operation 750 is performed. ” [0095] “ In operation 760 , the model training apparatus increments the training epoch t by “1.” Also, the process reverts to operation 730 to train the student model θ.sub.S . ” – Kim teaches a training process in which a current training epoch t is incremented by one in each iteration and training proceeds until a predefined maximum epoch is reached. This establishes a bounded, linearly increasing epoch variable. Expressing that linear progression as the ratio of the current epoch to the maximum epoch (t/ t max ) is the standard mathematical implementation of such an epoch-based linear schedule. While Kim teaches computing a factor based on linear epoch progression, Yuan teaches applying such a factor as a smoothing factor to control loss behavior during training.). Regarding claim 3 , Yuan in view of Kim teaches all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1. Yuan in view of Kim further teaches: wherein the first loss corresponds to a divergence between the SNN model outputs and the smoothed TNN model outputs (Yuan, [page 6, section 3] “ For knowledge distillation, the teacher-student learning mechanism is applied to improve the performance of the student … The idea behind knowledge distillation is to let the student (the model S) mimic the teacher by minimizing the cross-entropy loss and KL divergence between the predictions of student and teacher as L KD = 1-α H q,p + α D KL ( p τ t , p r ) … the teacher network and τ is the temperature to soften …(written as p τ t k after softened).” – Yuan teaches computing Kullback-Leibler divergence between the student model output distribution and the temperature-softened teacher model output distribution . ) Regarding claim 4 , Yuan in view of Kim teaches all the elements of claim 3 , therefore is rejected for the same reasons as those presented for claim 3 . Yuan in view of Kim further teaches: wherein the first loss corresponds to a Kullback-Leibler divergence between the SNN model outputs and the smoothed TNN model outputs (Yuan, [page 6, section 3] “ For knowledge distillation, the teacher-student learning mechanism is applied to improve the performance of the student … The idea behind knowledge distillation is to let the student (the model S) mimic the teacher by minimizing the cross-entropy loss and KL divergence between the predictions of student and teacher as L KD = 1-α H q,p + α D KL ( p τ t , p r ) … the teacher network and τ is the temperature to soften …(written as p τ t k after softened).” – Yuan teaches computing Kullback-Leibler divergence between the student model output distribution and the temperature-softened teacher model output distribution . ) . Regarding claim 5 , Yuan in view of Kim teaches all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1. Yuan in view of Kim further teaches: wherein the second loss corresponds to a divergence between the SNN model outputs and the set of predefined expected outputs (Yuan, [page 5, section 3] “ The model S can be trained by minimizing the cross-entropy loss: H q,p =- k=1 K q k log⁡ (p k ) ” [page 12, Appendix A] “ where q is the distribution of ground truth, p is the output distribution of student model, ” – Yuan teaches computing a cross-entropy loss between student outputs and ground-truth labels, which corresponds to a divergence between the SNN outputs and predefined expected outputs.) Regarding claim 6 , Yuan in view of Kim teaches all the elements of claim 5, therefore is rejected for the same reasons as those presented for claim 5. Yuan in view of Kim further teaches: wherein the second loss is computed based on a cross entropy loss function (Yuan, [page 5, section 3] “ The model S can be trained by minimizing the cross-entropy loss: H q,p =- k=1 K q k log⁡ (p k ) ” [page 12, Appendix A] “ where q is the distribution of ground truth, p is the output distribution of student model, ” – Yuan teaches computing a cross-entropy loss between student outputs and ground-truth labels, which corresponds to a divergence between the SNN outputs and predefined expected outputs.) . Regarding claim 7 , Yuan in view of Kim teaches all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1. Yuan in view of Kim further teaches: for each first training phase epoch, determining if the computed updated set of the SNN model parameters improves a performance of the SNN model relative to updated sets of SNN model parameters previously computed during the first training phase in respect of a development dataset that includes a set of development data samples and respective expected outputs, and when the computed updated set of the SNN model parameters does improve the performance, update the SNN model parameters to the computed updated set of the SNN model parameters prior to a next first training phase epoch (Yuan [page 8-9, section 5.1] Yuan describes training student models over multiple training epochs (“trained for 200 epochs”) and evaluating performance using “test accuracy,” further stating that the authors “report the mean of the best results” obtained during training, rather than merely reporting final-epoch performance. Reporting the “best results” from a multi-epoch training process necessarily entails determining whether a given set of model parameters improves performance relative to parameter sets computed in earlier epochs. Further, under BRI, identifying and reporting a “best” result distinct from the final result necessarily requires updating and retaining the specific SNN model parameters being overwritten by a subsequent training epoch. Accordingly, Yuan inherently teaches the conditional determination and parameter update required by the claim.) . Regarding claim 8 , Yuan in view of Kim teaches all the elements of claim 7, therefore i
Read full office action
METHOD AND SYSTEM FOR TRAINING A NEURAL NETWORK MODEL USING GRADUAL KNOWLEDGE DISTILLATION

This examiner grants 50% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD AND SYSTEM FOR TRAINING A NEURAL NETWORK MODEL USING GRADUAL KNOWLEDGE DISTILLATION

This examiner grants 50% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email