DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the amendment filed on 11/04/2025. Claims 1, 2, 4, and 6 are amended. Claim 5 is cancelled. Claim 12 is new. Claims 1-4 and 6-12 are pending and have been examined.
Claim Rejections - 35 USC § 112
The 112 rejection of Claim 4 is WITHDRAWN in view of Applicant’s amendments to the claim.
Claim Rejections - 35 USC § 101
The 101 rejection of Claims 1-4 and 6-11 are WITHDRAWN in view of Applicant’s arguments and amendments to the claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4 and 6-11 are rejected under 35 U.S.C. 103 as being unpatentable over by Vrbancic et al. “Transfer Learning With Adaptive Fine-Tuning”, hereinafter “Vrbancic”, in view of Chen et al. “A Closer Look at Few-shot Classification”, hereinafter “Chen”.
Regarding Claim 1, Vrbancic teaches:
A method for fine tuning a… classifier comprising a base network to recognize novel classes based on… learning (Vrbancic, p. 1, Abstract, “start with a model, pre-trained for a specific task, and then fine-tune (train) only certain layers of the neural network for a related but different target task”, Figure 1 showing novel target image dataset), comprising:
training the base network on one or more base classes (Vrbancic, p. 7, col. 1, ¶5, “the VGG19 convolutional base was pre-trained on an ImageNet dataset”);
performing an evolutionary search of possible learning strategies(Vrbancic, p. 4, col. 1, ¶2, “DE algorithm is composed of Np real-coded vectors and three operators: mutations, crossovers and selections”) on layers of the base network to determine which layers will be fixed and which layers will be fine-tuned for the novel classes(Vrbancic, p. 5, col. 2, ¶3, “The layers selection mechanism is based on the DE algorithm, modified for the task of selecting which layers of CNN architecture will be enabled for fine-tuning and which will remain frozen”) using a particular learning rate (Vrbancic, p. 7, col. 2, ¶3, “using the adam optimizer function with the initial learning rate”); and
partially fine-tuning the base network for the novel classes based on a most accurate learning strategy determined as a result of the evolutionary search (Vrbancic, p. 5, col. 2, ¶1, “The best individual (the selection of layers that produced the best performing CNN – with the lowest categorical cross-entropy loss) is deemed to be optimal under the given circumstances. The optimal found selection of layers is then used for fine-tuning the final CNN”);
wherein each possible learning strategy comprises an indication of frozen layers and an indication of a learning rate for each layer (frozen layers have learning rate of 0 as they do not learn during fine-tuning, p. 5, col. 1, ¶2, “array of binary values, where each value in the array reflects whether the corresponding layer of CNN architecture is selected for fine-tuning or not”, p. 5, Figure 2 Evaluation of selected layers shows binary array where 0 is “frozen” and 1 is “enabled” meaning enabled layers will have a learning rate of non-zero, p. 7, col. 2, ¶3, “trained using the adam optimizer function with the initial learning rate set to 1 ∗ 10−4”)
Vrbancic does not expressly teach:
few-shot classifier
However, Chen teaches:
few-shot classifier (Chen, p. 10, ¶2, “few-shot classification”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chen few shot learning with the classifier of Vrbancic. The motivation to do so would be to train a classifier with limited examples (Chen, p. 1, Abstract, “Few-shot classification aims to learn a classifier to recognize unseen classes during training with limited labeled examples”).
Regarding Claim 2, Vrbancic in view of Chen teaches the method of Claim 1 as referenced above. Vrbancic further teaches:
wherein each possible learning strategy comprises a vector defining a layer-wise learning rate for unfrozen feature extractors in the base network (Convolutional blocks in CNN are feature extractors, Vrbancic, p. 7, col. 1, ¶3, “CNN convolutional base… five convolutional blocks”, learning rate for each layer is based on binary array e.g. 0 or not, p. 5, col. 1, ¶2, “array of binary values, where each value in the array reflects whether the corresponding layer of CNN architecture is selected for fine-tuning or not”).
Regarding Claim 3, Vrbancic in view of Chen teaches the method of Claim 2 as referenced above. Vrbancic further teaches:
wherein a search space for the evolutionary search comprises mKpossible learning strategies, wherein: m is the number of choices for learning rate values; and K is the number of layers in the base network (Vrbancic, p. 5, col. 2, ¶3, “DE algorithm, modified for the task of selecting which layers of CNN architecture will be enabled for fine-tuning and which will remain frozen”, Choice is between a learning rate being on or off by the array of binary values, p. 5, Figure 2 Evaluation of selected layers shows binary array where 0 is “frozen” and 1 is “enabled” which is learning rate “on” or “off” therefore there are 2^k choices).
Regarding Claim 4, Vrbancic in view of Chen teaches the method of Claim 3 as referenced above. Vrbancic further teaches:
wherein a learning rate value of 0 in each possible learning strategy indicates a layer that is fixed during the partial fine-tuning of the base network (a frozen layer has a learning rate of zero and is fixed during partial fine tuning, Vrbancic, p. 5, Figure 2 shows binary array of 0 and 1 values enabling learning rate leading to layers being frozen or enabled for fine-tuning).
Regarding Claim 7, Vrbancic in view of Chen teaches the method of Claim 6 as referenced above. Vrbancic further teaches:
wherein the… classifier uses a… comprising a backbone feature extractor… and further wherein the partial fine-tuning is performed on the backbone feature extractor (Vrbancic, p. 13, col. 2, ¶5, “selecting which layers of given CNN architecture to fine-tune and which ones to leave frozen”).
In the combination as set forth above in Claim 1, Chen teaches:
the few shot classifier (Chen, p. 10, ¶2, “few-shot classification”)
In the combination as set forth in above in Claim 1, Vrbancic in view of Chen does not expressly teach:
baseline++ method and a cosine- distance classifier
However, Vrbancic in view of Chen further teaches:
baseline++ method and a cosine- distance classifier (Chen, p. 4, Figure 1 description, “train a new classifier C(.|Wn) with the given labeled examples in novel classes. The baseline++ method differs from the baseline model in the use of cosine distances between the input feature and the weight vector that aims to reduce intra-class variations”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chen’s baseline++ method that uses a cosine distance classifier and the DEFT method of Vrbancic. The motivation to do so would be to reduce intra-class variation during training (Chen, p. 4, ¶2, “explicitly reduces intra-class variation among features during training. The importance of reducing intra-class variations of features has been highlighted in deep metric learning”).
Regarding Claim 8, Vrbancic in view of Chen teaches the method of Claim 6 as referenced above. Vrbancic further teaches:
wherein the… classifier uses a meta method comprising a backbone network and a classifier and further wherein the partial fine-tuning is simultaneously performed on the backbone network and the classifier (Vrbancic, “we followed the strategy of fine-tuning only the last convolutional block (in our case block5 of VGG19 architecture) in the convolutional base, while the layers towards the beginning of the convolutional base were kept frozen… the fine-tunable layers were dynamically chosen by utilizing the proposed adaptive approach…. after the last convolutional block, in each experiment, the following randomly initialized feed-forward layers are added and trained in order to perform the classification task).
In the combination as set forth above in Claim 1, Chen teaches:
the few shot classifier (Chen, p. 10, ¶2, “few-shot classification”)
Regarding Claim 9, Vrbancic teaches the method of Claim 1 as referenced above. Vrbancic further teaches:
A system comprising: a processor; memory, storing software that, when executed by the processor, performs the method of claim 1 (Vrbancic, p. 6, col. 2, ¶3, “The experiments were executed on a single Intel Core i7-6700K based PC, with 4 cores (8 threads) CPU running at 4 GHz, with 64 GB of RAM, and three Nvidia GeForce Titan X Pascal GPUs each with 12 GB of dedicated GDDR5 memory”).
Regarding Claim 10, Vrbancic in view of Chen teaches the method of Claim 8 as referenced above. Vrbancic further teaches:
A system comprising:
a processor; memory, storing software that, when executed by the processor, performs the method of claim 8 (Vrbancic, p. 6, col. 2, ¶3, “The experiments were executed on a single Intel Core i7-6700K based PC, with 4 cores (8 threads) CPU running at 4 GHz, with 64 GB of RAM, and three Nvidia GeForce Titan X Pascal GPUs each with 12 GB of dedicated GDDR5 memory”).
Regarding Claim 11, Vrbancic in view of Chen teaches the method of Claim 7 as referenced above. Vrbancic further teaches:
A system comprising:
a processor; memory, storing software that, when executed by the processor, performs the method of claim 7 (Vrbancic, p. 6, col. 2, ¶3, “The experiments were executed on a single Intel Core i7-6700K based PC, with 4 cores (8 threads) CPU running at 4 GHz, with 64 GB of RAM, and three Nvidia GeForce Titan X Pascal GPUs each with 12 GB of dedicated GDDR5 memory”).
Regarding Claim 12, Vrbancic in view of Chen teaches the method of Claim 1 as referenced above. Vrbancic further teaches:
wherein the evolutionary search comprises:
randomly initializing a plurality of the possible learning strategies (Strategies are initialized randomly from interval [0,1], Vrbancic, p. 5, col. 2, ¶4, “The individuals in the proposed DEFT method are presented as an array… where each layer x (t) i,0 for i = 0, . . . ,N is selected from the interval [0, 1]”);
evaluating each possible learning strategy to determine its accuracy on a validation set for the novel classes (Vrbancic, p. 6, col. 1, ¶1, “For each produced individual… the fine-tuning of CNN is conducted. To determine how good or bad the produced individual is, we define a fitness function L. The fitness function is calculated after CNN model fine-tuning based on the given individual is finished and returned to the DE algorithm in order to enable the DE to find better individuals.”, fitness function determines accuracy on validation set, p. 5, col. 2, ¶1, “a fitness function, for which we utilized the well-known categorical cross-entropy (CCE) loss”);
selecting a predetermined number of the most accurate possible learning strategies to be used as parents to produce posterity strategies for one or more subsequent generations of strategies (Vrbancic, p. 4, col. 2, ¶3, “Np parameter, as presented, denotes the population size of real-coded vectors (individuals) on top of which the mutation, crossover, and selection operators are applied”, generation members are parents for subsequent generations, p. 4, col. 2, ¶2, “the selection operator is utilized to decide whether a produced vector should become a generation member utilizing the greedy criterion”); and
iteratively producing subsequent generations of search strategies based on the predetermined number of most accurate strategies for each generation until a best fine-tuning strategy is determined (Vrbancic, p. 5, col. 2, ¶1, “The process reiterates, trying to minimize the fitness value, until the maximum number of model evaluations is reached… The best individual (the selection of layers that produced the best performing CNN – with the lowest categorical cross-entropy loss) is deemed to be optimal under the given circumstances. The optimal found selection of layers is then used for fine-tuning the final CNN”).
Regarding Claim 6, Vrbancic in view of Chen teaches the method of Claim 12 as referenced above. Vrbancic further teaches:
wherein subsequent generations of search strategies are produced by applying mutation and crossover stages to the previous generation of strategies (Vrbancic, p. 4, col. 1, ¶4, “The DE’s basic strategy consists of mutation, crossover, and selection operations”, p. 4, col. 2, ¶3, “each generation”, p. 4, Equation 2 applies mutation, Equation 3 applies crossover, Equation 4 shows selecting subsequent generation from previous generation).
Response to Arguments
35 U.S.C 103
Argument 1: Vrbancic differs from the claimed invention and fails to teach wherein each learning strategy provides an indication of the learning rate for each unfrozen layer, as is now required by amended claim 1.
Examiner Response: Examiner respectfully disagrees. The amended Claim 1 recites wherein each possible learning strategy comprises an indication of frozen layers and an indication of a learning rate for each layer. Vrbancic clearly teaches that each possible learning strategy comprises an indication of frozen layers, p. 5, Figure 2 shows each individual with an array of binary values that show whether a layer is “Frozen” or “Enabled”. A frozen layer has a learning rate of 0 because they are not modified during fine-tuning, p. 1, col. 2, paragraph 2, “which of the layers should be enabled for training (fine-tuning) and which ones to leave frozen”. Vrbancic also clearly teaches an indication of a learning rate for each layer, when a layer is enabled that means it is further trained where it has a learning rate value, p. 7, col. 2, paragraph 3, “trained using the adam optimizer function with the initial learning rate set to 1 ∗ 10−4”.
Regarding applicant assertion that Vrbancic can only perform a binary selection of layers for fine-tuning while the claimed method introduces a learning rate into the search space allowing joint determination of which layers to fine-tune and the optimal learning rate for each layer, this is not reflected in the claim language. The claim language clearly matches to Vrbancic where each possible learning strategy has an indication of frozen layers and an indication of a learning rate for each layer, since 0 indicates frozen or no learning rate and 1 indicates enabled or a learning rate based off an adam optimizer function for each layer. There is nothing in the claim language of these limitations to differ from Vrbancic.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSE CHEN COULSON whose telephone number is (571)272-4716. The examiner can normally be reached Monday-Friday 8:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JESSE C COULSON/
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122