Office Action Analysis: 17430192 — METHOD OF GENERATING CLASSIFIER BY USING SMALL NUMBER OF LABELED IMAGES

Examiner Intelligence

ALSHAHARI, SADIK AHMED View full profile →
Grants only 35% of cases
Career Allow Rate
12 granted / 34 resolved
-19.7% vs TC avg
Strong +47% interview lift
Without
With
+47.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
24 currently pending
Career history
58
Total Applications
across all art units
Statute-Specific Performance

§101
31.8%
-8.2% vs TC avg
§103
41.7%
+1.7% vs TC avg
§102
4.1%
-35.9% vs TC avg
§112
16.7%
-23.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 34 resolved cases
Office Action

§103 §112
DETAILED ACTION
Status of Claims
Claim(s) 8-9 and 14 are pending and are examined herein. 
Claim(s) 8 have been Amended. Claim(s) 1-7, 11-13, and 15-18 previously Canceled. Claim 10 is Now Canceled.
Claim(s) 8-9 and 14 remain rejected under 35 U.S.C. § 112 and § 103.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of the applicant’s claim for Foreign priority to Application of PCT International Application No. PCT/CN2020/079018, and the PCT International application claims priority to Chinese Patent Application No. 201910235392.2, filed on March 26, 2019.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/17/2025 has been entered.
Response to Amendment
The amendment filed on July 17, 2025 has been entered. Claims 8-9 and 14 are pending in the application. Applicant’s amendments to the claims have been fully considered and are addressed in the rejections below. 
Response to Arguments
Applicant's arguments with respect to the rejection under 35 U.S.C. § 101, filed on 07/17/2025, have been fully considered and are persuasive. (See Remarks pp. 6-10).
Specifically, the claim recites image classification that relies on computer-performed quantitative analysis of pixel and region data and must adapt to changing object classes. The claimed invention integrates the judicial exception into a practical application by enabling adaptation and preserves classification accuracy for newly introduced classes, thereby improving the functioning of the computer-implemented image classification system. Accordingly, the rejection is withdrawn. 
Applicant's arguments with respect to the rejection under 35 U.S.C. § 103, filed on 07/17/2025 (see Remarks pp. 11-13), have been fully considered but are not persuasive and are moot in view of the new grounds of rejection necessitated by amendments. 
The Examiner refers to the updated rejection under 35 U.S.C. § 103 for more details. 
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim(s) 8-9 and 14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Regarding Currently Amended Claim 8, The claim is rejected for failing to comply with the written description requirement.
The claim recites the amended limitation: “wherein the N-class classifier comprises a neural network classification layer parameterized by classifier parameter values;” lines 42-43.
The specification describes the N-class classifier or a “classifier 206” at [0046], [0050], and describes that parameters are predicted for it and that it outputs classification scores. However, the specification does not describe the classifier as “comprising a neural network classification layer” or describe any structural components or composition of the classifier.
The limitation “comprises a neural network classification layer” adds structural details not found in the original disclosure. While the specification describes the classifier functionally (receiving inputs, outputting scores), but provides no description of its structure or what layers or components it comprises. Describing what a component does (outputs classification scores based on parameters) does not necessarily describe what component comprises or is structurally composed of. Therefore, the limitation “the N-class classifier comprises a neural network classification layer” introduces a new matter beyond what was originally described. 
Furthermore, claim 8, as currently amended, recites, “wherein the processor trains the N-class classifier using additional labeled digital images of the N classes ... thereby improving classification accuracy for the N classes without retraining the wide residual network.” lines 44-50.
The specification (e.g., [0037] and [0053]) discloses that portions of the pre-trained network are “retrained” as a feature extractor and describes training the N-class classifier and updating “the parameter of the N-class classifier.” However, the specification does not disclose that this training occurs “without retraining the wide residual network.” While the specification describes training the N-class classifier, it doesn’t describe that the wide residual network itself is not retrained or frozen during this training.  The term “retrained” does not suggest that the feature extractor parameters remain frozen during classifier training. 
This limitation “without retraining the wide residual network” imposes a process constraint on the training of the wide residual network that is not supported by the original disclosure. Accordingly, the limitation represents new matter that lacks adequate written description support.
Both limitations introduce subject matter without sufficient written description in the specification. Therefore, claim 8 fails to satisfy the written description requirement of 35 USC § 112(a). 
Regarding dependent claims 9 and 14, dependent claims inherit the deficiencies of the respective parent claim.
Regarding Original Claim 14, The claim is rejected for failing to comply with the written description requirement. 
As currently amended, independent claim 8 recites “wherein the processor trains the N-class classifier using additional labeled digital images of the N classes by: extracting, via the feature extractor, feature vectors of the additional labeled digital images; generating classification scores with the N-class classifier; and updating the classifier parameter values in accordance with a classification loss function associated with the N classes, ...” which requires training using additional labeled digital images (i.e., [0053] “images are randomly selected from each of N classes as images to be tested”).
Dependent claim further recites “train the N-class classifier after the N-class classifier is obtained, comprising: randomly select a number of images from each class of the N classes as images to be tested; extract feature vectors of the images to be tested by using the feature extractor; input the feature vectors extracted directly into the N-class classifier, so as to predict classification scores of the images to be tested being classified into each class; and update the parameter of the N-class classifier according to a result of the prediction, wherein a loss function used in the training process of the N-class classifier is the same as that used in the pre-training process of the wide residual network except for a number of image classes involved.”
The specification describes only a single after-generation training operation of the N-class classifier (e.g., para. [0053] describes an optional training step performed after the classifier is obtained. The specification does not describe or suggest multiple distinct after-generation training operations, nor does it describe a first training using “additional labeled digital images” and a subsequent training operation using “images to be tested,” as now required by the combined scope of claims 8 and 14. 
Accordingly, in view of the amendment to claim 8, the specification fails to provide adequate written description support for the current scope of claim 14.  

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 8-9 and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, for pre-AIA  the applicant regards as the invention.
Regarding Currently Amended Claim 8, the claim is rejected for failing to define the boundaries of the claimed invention.
The claim recites: “randomly select one or more images from each class of the N classes as training samples; extract a feature vector from training samples of each class, by using the feature extractor; input a total of N feature vectors extracted into a classifier generator;” lines 12-16.
The claim further recites in the amended limitation: “extract a plurality of feature vectors from a plurality of training samples of each class in response to extracting a plurality of images from each class as the training samples, and determine an average of the plurality of feature vectors as the feature vector for the each class, so that a total of N feature vectors are finally extracted for N classes,” lines 37-41, which renders the claim indefinite for the following reasons:
The claim introduce unclear relationships between limitations. It is unclear whether the amended limitation describes an additional operation performed after the earlier recited extraction, or whether the amended limitation describes how the earlier recited extraction is performed when multiple images are selected. Thus, it is unclear whether two feature vectors would be extracted per class, which is then contradicts with the recitation of “a total of N feature vectors are finally extracted for N classes”, or whether the amended limitation would replace the earlier extraction when multiple images are selected, but the claim does not clear indicate that the earlier extraction step is conditional to the number of images being selected. 
The claim amended limitations introduce insufficient antecedent basis in the claim that would render the scope of the claim indefinite. The term “the feature vector” lacks sufficient antecedent basis, as the earlier limitation recites “a feature vector”. It is unclear whether “the feature vector” in the amendment refers to the same feature vector from the earlier limitation or represents a different averaged feature vector, making it unclear which feature vector is being used in the claim process. Additionally, the phrase “a total of N feature vectors are finally extracted” in the amended limitation lacks clear antecedent basis to “a total of N feature vectors extracted” in the earlier limitation. Both use the term “a total,” creating uncertainty as to whether these refer to the same set or different sets of N feature vectors. If referring to the same set, the amended limitation should use “the total” to properly reference back to the previously recited set. 
Accordingly, the claim does not clearly define whether the extraction operation is conditional or whether the amended limitation represents an alternative/optional process. The lack of clarity as to whether “the feature vector” refers to the earlier-extracted vector, the averaged vector, or both renders the scope of the claim uncertain. One of ordinary skill in the art cannot determine the scope of the claim with reasonable certainty. Therefore, the claim fails to particularly point out and distinctly claim the invention under 35 USC § 112(b). 
For examination proposes, the Examiner interprets the limitation as “when a plurality of images is selected from each class as the training sample, extract a plurality of feature vectors from the training samples of each class and determine an average of the plurality of feature vectors.” The Examiner suggests to amend the claim to clearly establish the conditional and/or relationship between these extraction operations.
Regarding dependent claims 9 and 14, which depend from parent claim 8 inherit the deficiencies described above.
Regarding Original Claim 14, the claim is rejected for failing to define the boundaries of the claimed invention. 
Original Claim 14, which depends from parent claim 8, recites: “train the N-class classifier after the N-class classifier is obtained, comprising: randomly select a number of images from each class of the N classes as images to be tested; extract feature vectors of the images to be tested by using the feature extractor; input the feature vectors extracted directly into the N-class classifier, so as to predict classification scores of the images to be tested being classified into each class; and update the parameter of the N-class classifier according to a result of the prediction, wherein a loss function used in the training process of the N-class classifier is the same as that used in the pre-training process of the wide residual network except for a number of image classes involved.” 
Currently Amended Claim 8 recites the limitations “wherein the processor trains the N-class classifier using additional labeled digital images of the N classes by: extracting, via the feature extractor, feature vectors of the additional labeled digital images; generating classification scores with the N-class classifier; and updating the classifier parameter values in accordance with a classification loss function associated with the N classes, ...”
The specification at [0053]-[0054] describes one additional optional training step where “images to be tested” are used to train the N-class classifier after it is obtained by extracting feature vectors using the feature extractor, predicting classification scores, and updating parameters based on a loss function. 
Both amended claim 8 and claim 14 appears to describe this same training process. Thus, the lack of clear antecedent basis in the claimed training process of the N-class classifier. It is unclear whether claim 14 describes the same as or different training operation already presented by the amended limitations of claim 8. If claim 14 describes the same training operation as claim 8, then claim 14 is redundant and lacks clear antecedent basis in the claim. If claim 14 describes a different, additional training operation, it is unclear whether “additional labeled digital images” and “images to be tested” are the same or different image sets, and whether the specification provides  support for additional training the N-class classifier twice. Additionally, it is unclear whether “a classification loss function associated with N-classes” recited in claim 8 and “a loss function used in the training process of the N-class classifier is the same as that used in the pre-training process of the wide residual network ...” recited in claim 14 refer to the same or different loss functions.
This would be inconsistent with the specification paragraphs [0053]-[0054], and the specification does not provide a standard for ascertaining the requisite degree and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  
In view of the above, Examiner respectfully requests that Applicant thoroughly review the claims for compliance with the requirements set forth under 35 U.S.C. § 112.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 8 is rejected under 35 U.S.C. 103 as being unpatentable over Bin et al., (Pub. No.: US 20190286986 A1) in view of Gidaris et al., (NPL: "Dynamic Few-Shot Visual Learning without Forgetting." (2018)), and further in view of Sumbul et al., NPL: “Multisource Region Attention Network for Fine-Grained Object Recognition in Remote Sensing Imagery,” (January, 2019)).

Regarding Currently Amended Claim 8, 
Bin discloses the following:  
A computer device of generating a classifier by using labeled images, comprising: a processor; and a memory having instructions executable by the processor, wherein the instructions, when executed by the processor, cause the processor to: (Bin, [0027] “the present invention provides a machine learning model training apparatus. The apparatus includes a memory and a processor. The memory stores a programmable instruction. The processor is configured to invoke the programmable instruction to execute the method described in any one of the first aspect or the possible implementations of the first aspect or in any one of the second aspect or the possible implementations of the second aspect.” [0118] “An embodiment of the present invention provides a machine learning model training apparatus 100, configured to implement the method described in the embodiment of the present invention corresponding to FIG. 3. As shown in FIG. 9, the apparatus 100 includes a data obtaining module 101, a first feature extraction module 102, a training module 103, and an update module 104. [0052]-[0053] “Residual network (ResNet): A residual network is a convolutional neural network. ... In the embodiments of the present invention, a task that needs to be handled is referred to as a target task, for example, a small-sample image recognition task.” [0072] “For example, in small-sample machine learning meta-learning, meta-learning is a manner of small-sample machine learning. In a meta-learning method, a series of small-sample tasks are used for training to obtain a meta-learner. The meta-learner generates a learner based on training data in the small-sample tasks, and finally the learner completes prediction for test data in the small-sample tasks.”) 
pre-train a wide residual network by using a set of labeled data, and determine portions of the pre-trained wide residual network except for a fully connected layer as a feature extractor for an image; (Bin, [0101] “In this embodiment of the present invention, meta-SGD (Meta Learner) is selected for the target task model. The meta-SGD includes three fully connected layers. The support task model is a multi-sample classifier (Image Classifier), and includes only one fully connected layer (Fully Connected Layer, FC). The memory model uses a design the same as that of ResNet50, and a difference lies in that a last layer but one is used as an output layer.” [0105] “S1. Initialize a memory model (ResNet50), meta-SGD, and a multi-sample classifier, where initializing the meta-SGD includes initializing a neural network parameter θ and a learning rate α. The memory model is a residual network, and a parameter ψ of the memory model is initialized. The multi-sample classifier (whose parameter is ϕ) is initialized in a general neural network initialization manner. It should be noted that if the model is trained, the parameter is directly loaded, or if the model is not trained, the model parameter is initialized in a general manner.” [0107] “S3. Input all the data selected in the previous step into the memory model (ResNet50) for feature extraction.”) [Examiner’s Note: The use of memory model of the ResNet50 (i.e., wide residual network) as feature extraction except the las fully-connected layer. The model trained on labeled data and perform feature extraction. This reads on the claim limitation.]”)
randomly select, for a N-class classifier to be generated, N classes from a training set for each of a plurality of times; and for N classes selected each of the plurality of times: (Bin, [0100] “In this embodiment of the present invention, a small-sample batch task has two settings: 5-way-1-shot and 5-way-5-shot. “5-way” represents that each small-sample task includes five categories of images, “1-shot” represents that each category of training sample includes one image, and “5-shot” represents that each category of training sample includes five images. Either of the two settings may be selected.” [0106]-[0108] “S2. Randomly sample training data from a data pool. Specifically, for a target task (a small-sample task), five categories are randomly sampled from a training set, then one or several images is/are randomly selected from each of the five categories as first target task training data, and then one or several images is/are randomly selected from each of the five categories as second target task training data. ....  Specifically, for each target task of a small sample, the meta-SGD obtains a learner θi′ based on the first target task training feature data,” [0016] “... executing the process of obtaining target task training data and N categories of support task training data and repeatedly executing a training process until the foregoing condition is met.” [0072] “The meta-learner generates a learner based on training data in the small-sample tasks, and finally the learner completes prediction for test data in the small-sample tasks.”) [Examiner’s Note: Bin teaches randomly selecting N classes from a training set (i.e., categories are randomly sampled from a training set) for each of a plurality of times (i.e., repeated execution of training process) to generate an N-class classifier (i.e., obtain target task learner).]
randomly select one or more images from each class of the N classes as training samples; (Bin, [0100] “In this embodiment of the present invention, a small-sample batch task... “1-shot” represents that each category of training sample includes one image, and “5-shot” represents that each category of training sample includes five images.” [0106] “Specifically, for a target task (a small-sample task), five categories are randomly sampled from a training set, then one or several images is/are randomly selected from each of the five categories as first target task training data, and then one or several images is/are randomly selected from each of the five categories as second target task training data.”)
extract a feature vector from training samples of each class, by using the feature extractor; (Bin, [0107] “S3. Input all the data selected in the previous step into the memory model (ResNet50) for feature extraction. Specifically, each image in data corresponding to the target task is adjusted as a 224×224 input, to obtain a vector with a length of 2048, including first target task training feature data and second target task training feature data.” Further see [0062].)
input a total of N feature vectors extracted into a classifier generator; (Bin, [0004] “In small-sample learning, training (meta-training) data in a small-sample task is usually used to generate a learner (learner).” [0031] “The first feature extraction module is configured with a memory model, and the training module is configured with a target task model and N support task models.  ... the N categories of support task training data are in a one-to-one correspondence with the N support task models,” [0061] “The target task training data corresponds to a target task model. Optionally, there may be one or more target task models. In this embodiment of the present invention, a quantity of target task models is not limited, and a type of the target task model is not limited, either. The N categories of support task training data are in a one-to-one correspondence with N support task models, ...” [0108] “Specifically, for each target task of a small sample, the meta-SGD obtains a learner θi′ based on the first target task training feature data, ...”) [Examiner’s Note: Bin uses feature data extracted by a memory model to obtain learners or target task models. The meta-SGD Meta learner using the training module reads on the classifier generator.]
sequentially perform a class information fusion and a parameter prediction for the N-class classifier by using the classifier generator, (Bin, [0065]-[0069] “In this embodiment of the present invention, obtained losses of both the target task model and the support task model are used to update the memory model, the target task model, and the N support task models. The losses of the target task model and the N support task models are used for updating, so that the memory model, the target task model, and the N support task models are logically associated, and an abstract feature is stored in respective parameters. When the target task model trained by using the method provided in this embodiment of the present invention is used together with the memory model, performance is better, and a task processing result is more accurate. .... S1042. Update a first parameter of the memory model, a second parameter of the target task model, and respective third parameters of the N support task models based on the obtained target loss.” [0105] “The memory model is a residual network, and a parameter ψ of the memory model is initialized. The multi-sample classifier (whose parameter is ϕ) is initialized in a general neural network initialization manner. It should be noted that if the model is trained, the parameter is directly loaded, or if the model is not trained, the model parameter is initialized in a general manner. .... [0109] “S5. Update the meta-SGD (θ and α), the multi-sample classifier (ϕ), and a parameter ψ of the memory model based on a combined loss.”)
.....
wherein the instructions, when executed by the processor, further cause the processor to: extract a plurality of feature vectors from a plurality of training samples of each class in response to extracting a plurality of images from each class as the training samples, (Bin, [0100] “...“5-shot” represents that each category of training sample includes five images.” [0106] “Specifically, for a target task (a small-sample task), five categories are randomly sampled from a training set, then one or several images is/are randomly selected from each of the five categories as first target task training data, and then one or several images is/are randomly selected from each of the five categories as second target task training data.” [0055] “Multi-sample data: In some support data sets or training data sets, a specific label corresponds to a plurality of pieces of labeled data, and the plurality of pieces of labeled data are collectively referred to multi-sample data.” Further see [0078].) [Examiner’s Note: the claim limitation is broadly interpreted as extracting feature vector based on a plurality of images from each class.]
....
wherein the N-class classifier comprises a neural network classification layer parameterized by classifier parameter values; (Bin, [0072] “ For each task, the meta-learner obtains a learner through learning in the training set, and the learner performs prediction on the test set. The learner may be a neural network, may be a continuous regression function, or may be in another form. The meta-learner is a machine learning method for training the learner.” [0102] “the meta-SGD, the multi-sample classifier, and the memory model are updated according to the following target function formula: ... [0108] “ the meta-SGD obtains a learner θi′ based on the first target task training feature data, and then the learner classifies second target task training feature data of each task,  ... Each training may include a plurality of target tasks (small-sample tasks).”) and
wherein the processor trains the N-class classifier using additional labeled digital images of the N classes by: extracting, via the feature extractor, feature vectors of the additional labeled digital images; generating classification scores with the N-class classifier; and updating the classifier parameter values in accordance with a classification loss function associated with the N classes, thereby improving classification accuracy for the N classes without retraining the wide residual network. (Bin, [0095]-[0096] “S2030. Input the obtained target task feature data into a target task model to obtain a target task result. ... Then, in S1020, the inputting the target task training data into a memory model to obtain target task training feature data is specifically: inputting the first target task training data and the second target task training data into the memory model to obtain first target task training feature data and second target task training feature data, where correspondingly the target task training feature data includes the first target task training feature data and the second target task training feature data, the first target task training feature data corresponds to the first target task training data, and the second target task training feature data corresponds to the second target task training data.” [0106] “For a support task, 64 multi-sample images are randomly selected from a support data pool of 200 categories of images, as support task training data.” [0108] “the learner classifies second target task training feature data of each task, and obtain a loss .. based on a true label.” [0109] “S5. Update the meta-SGD (θ and α), the multi-sample classifier (ϕ), and a parameter ψ of the memory model based on a combined loss.” [0111]-[0113] “S01. The memory model (memory module) loads the trained parameter ψ. S02. Randomly select five categories from the test set, and randomly select one or several images from each of the five categories as target task labeled data, so that the trained meta-SGD obtains the learner. S03. Randomly select five categories from the test set, randomly select one or several images from each of the five categories as target task data, and input the target task data to obtain a learner, to obtain a prediction result.” [0065] “When the target task model trained by using the method provided in this embodiment of the present invention is used together with the memory model, performance is better, and a task processing result is more accurate.” Further see [0118].)
Bin does not appear to explicitly teach the following:
wherein the classifier generator comprises a class information fusion module and a classifier parameter prediction module; and wherein the instructions, when executed by the processor, further cause the processor to: stitch feature vectors for the N classes to form a matrix with N rows; input the matrix into the class information fusion module so as to obtain a fusion feature matrix, wherein each row of the fusion feature matrix indicates a class feature for a corresponding row of the matrix input; and input the fusion feature matrix into the classifier parameter prediction module, so as to predict a parameter of the N-class classifier,
wherein the class information fusion module comprises one fully connected layer having N input dimensions and N output dimensions, wherein the classifier parameter prediction module comprises one fully connected layer having input and output dimensions same as dimensions of the feature vector of the image,
However, Bin in view of Gidaris teaches the following:
sequentially perform a class information fusion and a parameter prediction for the N-class classifier by using the classifier generator, wherein the classifier generator comprises a class information fusion module and a classifier parameter prediction module; (Gidaris, [P. 5, Section: 3.2.] “We enhance the above feature averaging mechanism with an attention based mechanism that composes novel classification weight vectors by “looking” at a memory that contains the base classification weight vectors                                 
                                    
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                                    =
                                     
                                            {
                                            w
                                            b
                                            }
                                             
                                            b
                                            =
                                            1
                                        
                                                    K
                                                
                                                    b
                                                    a
                                                    s
                                                    e
                                                
                            . More specifically, an extra attention-based classification weight vector                                 
                                    
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                             is computed as:                                 
                                    
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                                    =
                                    
                                            1
                                        
                                                    N
                                                
                                                    '
                                                
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                                        N
                                                    
                                                        '
                                                    
                                                    ∑
                                                    
                                                        b
                                                        =
                                                        1
                                                    
                                                                K
                                                            
                                                                b
                                                                a
                                                                s
                                                                e
                                                            
                                                    A
                                                    t
                                                    t
                                                    
                                                                    ϕ
                                                                
                                                                    q
                                                                
                                                                            z
                                                                        
                                                                        -
                                                                    
                                                                    i
                                                                
                                                                    '
                                                                
                                                            ,
                                                            
                                                                    k
                                                                
                                                                    b
                                                                
                                                    ∙
                                                     
                                                                    w
                                                                
                                                                -
                                                            
                                                            b
                                                        
                              (2) where                                 
                                    ϕ
                                    _
                                    q
                                     
                                    ∈
                                    
                                            R
                                        
                                            d
                                            ×
                                            d
                                        
                             is a learnable weight matrix that transforms the feature vector                                 
                                    
                                                    z
                                                
                                                -
                                            
                                            i
                                        
                                            '
                                        
                             to query vector used for querying the memory,                                 
                                    
                                            {
                                            
                                                    k
                                                
                                                    b
                                                
                                            ∈
                                            
                                                    R
                                                
                                                    d
                                                
                                            R
                                             
                                            d
                                            }
                                        
                                            b
                                        
                                                    K
                                                
                                                    b
                                                    a
                                                    s
                                                    e
                                                
                             is a set of                                 
                                    
                                            K
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                             learnable keys (one per base category) used for indexing the memory, and                                 
                                    A
                                    t
                                    t
                                    (
                                    .
                                    ,
                                     
                                    .
                                    )
                                
                             is an attention kernel implemented as a cosine similarity function2 followed by a softmax operation over the                                 
                                    
                                            K
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                             base categories. The final classification weight vector is computed as a weighted sum of the average based classification vector w ′ avg and the attention based classification vector                                 
                                    
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                                    ,
                                    
                                            w
                                        
                                            '
                                        
                                    =
                                     
                                            ϕ
                                        
                                            a
                                            v
                                            g
                                        
                                    ⊙
                                     
                                            w
                                        
                                            a
                                            v
                                            g
                                        
                                            '
                                        
                                    +
                                     
                                            ϕ
                                        
                                            a
                                            t
                                            t
                                        
                                    ⊙
                                     
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                            , where ⊙ is the Hadamard product, and                                 
                                    
                                            ϕ
                                        
                                            a
                                            v
                                            g
                                        
                            ,                                 
                                    
                                            ϕ
                                        
                                            a
                                            t
                                            t
                                        
                                    ∈
                                    
                                            R
                                        
                                            d
                                        
                             are learnable weight vectors.”) and wherein the instructions, when executed by the processor, further cause the processor to: stitch feature vectors for the N classes to form a matrix with N rows; input the matrix into the class information fusion module so as to obtain a fusion feature matrix, wherein each row of the fusion feature matrix indicates a class feature for a corresponding row of the matrix input; (Gidaris, [p. 3, Section: 2, Col. 1] “Our few-shot classification weight generator also includes a feature averaging mechanism.” [p. 3, Section: 3, Col. 2] “ConvNet-based recognition model. It consists of (a) a feature extractor                                 
                                    F
                                    (
                                    .
                                    |
                                    θ
                                    )
                                
                             (with learnable parameters θ) that extracts a d-dimensional feature vector                                 
                                    z
                                     
                                    =
                                     
                                    F
                                    (
                                    x
                                    |
                                    θ
                                    )
                                     
                                    ∈
                                    
                                            R
                                        
                                            d
                                        
                             from an input image                                 
                                    x
                                
                            , and (b) a classifier                                 
                                    C
                                    (
                                    .
                                    |
                                    
                                            W
                                        
                                            *
                                        
                                    )
                                
                            , where                                 
                                    
                                            W
                                        
                                            *
                                        
                                    =
                                     
                                                            w
                                                        
                                                            k
                                                        
                                                            *
                                                        
                                                    ∈
                                                    
                                                            R
                                                        
                                                            d
                                                        
                                            k
                                            =
                                            1
                                        
                                                    K
                                                
                                                    *
                                                
                             are a set of                                 
                                    
                                            K
                                        
                                            *
                                        
                             classification weight vectors - one per object category, that takes as input the feature representation                                 
                                    z
                                
                             and returns a                                 
                                    
                                            K
                                        
                                            *
                                        
                             -dimensional vector with the probability classification scores                                 
                                    p
                                     
                                    =
                                     
                                    C
                                    (
                                    z
                                    |
                                    
                                            W
                                        
                                            *
                                        
                                    )
                                
                             of the                                 
                                    
                                            K
                                        
                                            *
                                        
                             categories. Note that in a typical convolutional neural network the feature extractor is the part of the network that starts from the first layer and ends at the last hidden layer while the classifier is the last classification layer. During the single training phase of our algorithm, we learn the                                 
                                    θ
                                
                             parameters and the classification weight vectors of the base categories                                 
                                    
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                                    =
                                     
                                                            w
                                                        
                                                            k
                                                        
                                            k
                                            =
                                            1
                                        
                                                    K
                                                
                                                    b
                                                    a
                                                    s
                                                    e
                                                
                             such that by setting                                 
                                    
                                            W
                                        
                                            *
                                        
                                    =
                                     
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                             the ConvNet model will be able to recognize the base object categories. Few-shot classification weight generator. This comprises a meta-learning mechanism that, during test time, takes as input a set of                                 
                                    
                                            K
                                        
                                            n
                                            o
                                            v
                                            e
                                            l
                                        
                             novel categories with few training examples per category                                 
                                    
                                            D
                                        
                                            n
                                            o
                                            v
                                            e
                                            l
                                        
                                    =
                                     
                                            U
                                        
                                            n
                                            =
                                            1
                                        
                                                    K
                                                
                                                    n
                                                    o
                                                    v
                                                    e
                                                    l
                                                
                                            {
                                            
                                                    x
                                                
                                                    n
                                                    ,
                                                    i
                                                
                                                    '
                                                
                                            }
                                        
                                            i
                                            =
                                            1
                                        
                                                    N
                                                
                                                    n
                                                
                                                    '
                                                
                            , where                                 
                                    
                                            N
                                        
                                            n
                                        
                                            '
                                        
                             is the number of training examples of the n-th novel category and                                 
                                    
                                            x
                                        
                                            n
                                            ,
                                            i
                                        
                                            '
                                        
                             is its i-th training example, and is able to dynamically assimilate the novel categories on the repertoire of the above ConvNet model. More specifically, for each novel category                                 
                                    n
                                     
                                    ∈
                                     
                                    [
                                    1
                                    ,
                                     
                                            N
                                        
                                            n
                                            o
                                            v
                                            e
                                            l
                                        
                                    ]
                                
                            , the few-shot classification weight generator                                 
                                    G
                                    (
                                    .
                                    ,
                                     
                                    .
                                    |
                                    ϕ
                                    )
                                
                             gets as input the feature vectors                                 
                                    
                                            Z
                                        
                                            n
                                        
                                            '
                                        
                                    =
                                     
                                                            z
                                                        
                                                            n
                                                            ,
                                                            i
                                                        
                                                            '
                                                        
                                            n
                                            =
                                            1
                                        
                                                    N
                                                
                                                    n
                                                
                                                    '
                                                
                             of its                                 
                                    
                                            N
                                        
                                            n
                                        
                                            '
                                        
                             training examples, where                                 
                                    
                                            z
                                        
                                            n
                                            ,
                                            i
                                        
                                            '
                                        
                                    =
                                     
                                    F
                                    (
                                    
                                            x
                                        
                                            n
                                            ,
                                            i
                                        
                                            '
                                        
                                    |
                                    θ
                                    )
                                
                            , and the classification weight vectors of the base categories                                 
                                    
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                             and generates a classification weight vector                                 
                                    
                                            w
                                        
                                            n
                                        
                                            '
                                        
                                    =
                                     
                                    F
                                    (
                                    
                                            z
                                        
                                            n
                                        
                                            '
                                        
                                    ,
                                     
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                                    |
                                    θ
                                    )
                                
                             for that novel category. Note that φ are the learnable parameters of the few-shot weight generator, which are learned during the single training phase of our framework. Therefore, if                                 
                                    
                                            W
                                        
                                            n
                                            o
                                            v
                                            e
                                            l
                                        
                                    =
                                     
                                            {
                                            
                                                    w
                                                
                                                    n
                                                
                                                    '
                                                
                                            }
                                        
                                            n
                                            =
                                            1
                                        
                                                    K
                                                
                                                    n
                                                    o
                                                    v
                                                    e
                                                    l
                                                
                             are the classification weight vectors of the novel categories inferred by the few-shot weight generator, then by setting                                 
                                    
                                            W
                                        
                                            *
                                        
                                    =
                                    
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                                    ∪
                                    
                                            W
                                        
                                            n
                                            o
                                            v
                                            e
                                            l
                                        
                             on the classifier                                 
                                    C
                                    (
                                    .
                                    |
                                    
                                            W
                                        
                                            *
                                        
                                    )
                                
                             we enable the ConvNet model to recognize both base and novel categories.” [P. 5, Section: 3.2] “The few-shot classification weight generator                                 
                                    G
                                    (
                                    .
                                    ,
                                     
                                    .
                                    |
                                    ϕ
                                    )
                                
                             gets as input the feature vectors                                 
                                    
                                            Z
                                        
                                            '
                                        
                                    =
                                     
                                                            z
                                                        
                                                            i
                                                        
                                                            '
                                                        
                                            n
                                            =
                                            1
                                        
                                                    N
                                                
                                                    '
                                                
                             of the                                 
                                    
                                            N
                                        
                                            '
                                        
                             training examples of a novel category (typically                                 
                                    
                                            N
                                        
                                            '
                                        
                            ≤ 5) and (optionally) the classification weight vectors of the base categories Wbase. Based on them, it infers a classification weight vector                                 
                                    
                                            w
                                        
                                            '
                                        
                                    =
                                    G
                                    (
                                    
                                            Z
                                        
                                            '
                                        
                                    ,
                                     
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                                    |
                                    ϕ
                                    )
                                
                             for that novel category. Here we explain how the above few-shot classification weight generator is constructed. ... the cosine similarity based classifier of the ConvNet model forces the feature extractor to learn feature vectors that form compact category-wise clusters and the classification weight vectors to learn to be representative feature vectors of those clusters, an obvious choice is to infer the classification weight vector                                 
                                    
                                            w
                                        
                                            '
                                        
                             by averaging the feature vectors of the training examples (after they have been l2-normalized):                                 
                                    
                                            w
                                        
                                            a
                                            v
                                            g
                                        
                                            '
                                        
                                    =
                                     
                                            1
                                        
                                                    N
                                                
                                                    '
                                                
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                                        N
                                                    
                                                        '
                                                    
                                                            z
                                                        
                                                        -
                                                    
                                                    i
                                                
                                                    '
                                                
                            . The final classification weight vector in case we only use the feature averaging mechanism is:                                 
                                    
                                            w
                                        
                                            '
                                        
                                    =
                                    
                                            ϕ
                                        
                                            a
                                            v
                                            g
                                        
                                    ⊙
                                    
                                            w
                                        
                                            a
                                            v
                                            g
                                        
                                            '
                                        
                            , where ⊙ is the Hadamard product, and                                 
                                    
                                            ϕ
                                        
                                            a
                                            v
                                            g
                                        
                                    ∈
                                    
                                            R
                                        
                                            d
                                        
                             is a learnable weight vector. ... We enhance the above feature averaging mechanism with an attention based mechanism that composes novel classification weight vectors by “looking” at a memory that contains the base classification weight vectors                                 
                                    
                                            W
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                                    =
                                     
                                            {
                                            w
                                            b
                                            }
                                             
                                            b
                                            =
                                            1
                                        
                                                    K
                                                
                                                    b
                                                    a
                                                    s
                                                    e
                                                
                            . More specifically, an extra attention-based classification weight vector                                 
                                    
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                             is computed as:                                 
                                    
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                                    =
                                    
                                            1
                                        
                                                    N
                                                
                                                    '
                                                
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                                        N
                                                    
                                                        '
                                                    
                                                    ∑
                                                    
                                                        b
                                                        =
                                                        1
                                                    
                                                                K
                                                            
                                                                b
                                                                a
                                                                s
                                                                e
                                                            
                                                    A
                                                    t
                                                    t
                                                    
                                                                    ϕ
                                                                
                                                                    q
                                                                
                                                                            z
                                                                        
                                                                        -
                                                                    
                                                                    i
                                                                
                                                                    '
                                                                
                                                            ,
                                                            
                                                                    k
                                                                
                                                                    b
                                                                
                                                    ∙
                                                     
                                                                    w
                                                                
                                                                -
                                                            
                                                            b
                                                        
                              (2) where                                 
                                    ϕ
                                    _
                                    q
                                     
                                    ∈
                                    
                                            R
                                        
                                            d
                                            ×
                                            d
                                        
                             is a learnable weight matrix that transforms the feature vector                                 
                                    
                                                    z
                                                
                                                -
                                            
                                            i
                                        
                                            '
                                        
                             to query vector used for querying the memory,                                 
                                    
                                            {
                                            
                                                    k
                                                
                                                    b
                                                
                                            ∈
                                            
                                                    R
                                                
                                                    d
                                                
                                            R
                                             
                                            d
                                            }
                                        
                                            b
                                        
                                                    K
                                                
                                                    b
                                                    a
                                                    s
                                                    e
                                                
                             is a set of                                 
                                    
                                            K
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                             learnable keys (one per base category) used for indexing the memory, and                                 
                                    A
                                    t
                                    t
                                    (
                                    .
                                    ,
                                     
                                    .
                                    )
                                
                             is an attention kernel implemented as a cosine similarity function2 followed by a softmax operation over the                                 
                                    
                                            K
                                        
                                            b
                                            a
                                            s
                                            e
                                        
                             base categories. The final classification weight vector is computed as a weighted sum of the average based classification vector w ′ avg and the attention based classification vector                                 
                                    
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                                    ,
                                    
                                            w
                                        
                                            '
                                        
                                    =
                                     
                                            ϕ
                                        
                                            a
                                            v
                                            g
                                        
                                    ⊙
                                     
                                            w
                                        
                                            a
                                            v
                                            g
                                        
                                            '
                                        
                                    +
                                     
                                            ϕ
                                        
                                            a
                                            t
                                            t
                                        
                                    ⊙
                                     
                                            w
                                        
                                            a
                                            t
                                            t
                                        
                                            '
                                        
                            , where ⊙ is the Hadamard product, and                                 
                                    
                                            ϕ
                                        
                                            a
                                            v
                                            g
                                        
                            ,                                 
                                    
                                            ϕ
                                        
                                            a
                                            t
                                            t
                                        
                                    ∈
                                    
                                            R
                                        
                                            d
                                        
                             are learnable weight vectors.”)” ) [Examiner’s Note: Gidaris teaches a few-shot classifier weight generator that takes as input feature vectors extracted from training examples of each category (i.e., class) using feature extractor F, uses attention mechanism for averaging feature vectors together to generated an averaged matrix, and generates a final classification weight vector for each novel category. Thus, Gidaris teaches a classifier generator (i.e., few-shot classification weight generator) that includes class information fusion module (i.e., attention mechanism that takes feature vectors from each class and averages information via weighted combination) and classifier parameter prediction module (i.e., weight composition mechanism that takes the attention-weighted combination of features and predicts final classification weight vector for the n-th category).]
	Therefore, at the effective filing date, it would have been prima facie obvious to one of ordinary skill in the art to modify the system of Bin (for small-sample task machine learning model training) to incorporate the proposed few-shot recognition system as taught by Gidaris. One would have been motivated to make such a combination in order to be able to both learn to accurately recognize base categories and to learn to perform few-shot learning of novel categories in a dynamic manner and without forgetting the base ones. Doing so would improve the few-shot recognition performance while at the same time we do not sacrifice any accuracy (Gidaris [Abstract]).
While Bin in view of Gidaris teaches the classifier generator including class fusion and parameter prediction for N-classes. Bin in view of Gidaris does not appear to explicitly teach:
wherein the class information fusion module comprises one fully connected layer having N input dimensions and N output dimensions, wherein the classifier parameter prediction module comprises one fully connected layer having input and output dimensions same as dimensions of the feature vector of the image,
However, Sumbul, in combination with Bin in view of Gidaris, teaches:
wherein the classifier generator comprises a class information fusion module and a classifier parameter prediction module; (Sumbul, teaches multisource model comprise concatenation module and classifier prediction module. See Fig. 2.) stitch feature vectors for the N classes to form a matrix with N rows; input the matrix into the class information fusion module so as to obtain a fusion feature matrix, wherein each row of the fusion feature matrix indicates a class feature for a corresponding row of the matrix input; and input the fusion feature matrix into the classifier parameter prediction module, so as to predict a parameter of the N-class classifier, (Sumbul, [Pp. 3-5, Section: III] Multisource feature concatenation: Fig. 2. (a) Basic multisource model. The feature representations independently obtained from each source are concatenated as the object representation. ….,  The final representation for each additional source is the weighted sum                                 
                                    
                                            ϕ
                                        
                                            m
                                        
                                            a
                                            t
                                            t
                                        
                                                    x
                                                
                                                    m
                                                
                             of its proposal regions’ representations, and the final representation                                 
                                    
                                            ϕ
                                        
                                            m
                                        
                                            a
                                            t
                                            t
                                        
                                            x
                                        
                             used for class prediction is obtained by concatenation. …..,  Figure. 3: The last branch C calculates the class scores from the concatenation of the feature representations of all three sources φ att. It consists of four FC layers containing 128, 64, 32 and 40 neurons, the last one giving the class scores. Note that, the feature map sizes and descriptive names are stated at the top of each layer. [P. 7, Section: C] Table II summarizes the results for multisource classification for both 18-class and 40-class settings. We used two versions of the basic multisource model in Figure 2(a). The version named basic CNN model uses the first, third, and fifth branches in Figure 3 as the feature extractor networks, concatenates the resulting feature representations, and uses an FC layer as the classifier. This model is also learned in an end-to-end fashion. The version named recurrent attention model uses a network that learns discriminative region selection and region based feature representation at multiple scales [32]. …”) [Note: Figures 2 and 3:  the concatenation of the attention driven feature representation would correspond to the “fusion feature matrix” and the final class scores would correspond “predicted class parameter”.]
wherein the class information fusion module comprises one fully connected layer having N input dimensions and N output dimensions, (Sumbul, [P. 4, Figure 3] “we define an architecture that is formed by the combination of five deep convolutional neural network branches and a block of fully-connected (FC) layers as shown in Figure 3.”) [Note: Fig. 3 shows the network includes Fully Connected layer with input and output dimension.] wherein the classifier parameter prediction module comprises one fully connected layer having input and output dimensions same as dimensions of the feature vector of the image, (Sumbul, [P. 4, Figure 3] “In place of the feature representation that is obtained from a single CNN that is trained on RGB data in [3], the multisource image embedding in this paper is obtained from the output of the first fully-connected layer in the classifier (last) branch of the network in Figure 3.” [P.7, Col.2, Lines 5-10] “We use the two-scale architecture to train a feature extractor for each source, concatenate the resulting feature representations, and train an FC layer as the classifier”) [Note: Fig. 3 shows the network includes Fully Connected layer.]
	Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Bin, Gidaris, and Sumbul before them, to incorporate the multisource fine-grained object recognition methodology as taught by Sumbul. One would have been motivated to make such a combination in order to solve multisource classification problems and improve feature concatenation approach from multiple sources (Sumbul [Abstract]).
Claim(s) 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bin, Gidaris, and Sumbul as outlined above, and further in view of Wang et al., (NPL: “Large Margin Few-Shot Learning,” (2018)).
Regarding Previously Presented Claim 9, the combination of Bin, Gidaris, and Sumbul teaches the elements of claim 8 as outlined above, and further teaches:
wherein the instructions, when executed by the processor, further cause the processor to: select the set of labeled data, and divide the set of labeled data into the training set and a test set according to image classes, wherein the training set and the test set do not overlap each other; (Bin, [0072] “ small-sample task is divided into two stages: meta training and meta testing. As shown in FIG. 4, it should be noted that the meta training stage is a process of model training, and the meta testing stage is a process of using a model to execute a task, for example, predicting an image type. Meta-training and meta-testing have a same task structure, and both include a training set (training set) and a test set (testing set). For each task, the meta-learner obtains a learner through learning in the training set, and the learner performs prediction on the test set. The learner may be a neural network, may be a continuous regression function, or may be in another form.” [0077] “Optionally, the second target task training data includes a plurality of target task test samples. Correspondingly, the second target task training feature data includes a plurality of target task test feature samples, and each target task test feature sample includes first target task feature information and a corresponding first target task label. In this case, the obtaining a first test result based on the target task feature information and the trained target task model is specifically: obtaining, based on first target task feature information respectively corresponding to the plurality of target task test feature samples and the trained target task model, first test results respectively corresponding to the plurality of target task test feature samples.” [0099] “64 categories of images are selected as a training set, and 20 categories of images are selected as a test set. ...  Optionally, Caltech-UCSD Birds-200-2011 (CUB-200) includes images of 200 different categories of birds, and there are a total of 11,788 color images. 140 categories of images are selected as a training set, and 40 categories of images are selected as a test set.”)
train the wide residual network for a predetermined number of times by using the training set; and test the trained wide residual network by using the test set; wherein the wide residual network comprises a multi-layer convolutional neural network and a fully connected layer; and in the pre-training process, after each image is input into the wide residual network, an output of the fully connected layer at the end of the wide residual network indicates a classification score of the input image being classified into each class, (Bin, [0008] “In this possible implementation, the target task model is a small-sample learning model, and training data that is used to train the small-sample learning model includes a training set and a test set. The training set corresponds to the first target task training data, and the test set corresponds to the second target task training data. The first loss used as an output is obtained by using data in the test set. In other words, the first loss is calculated by using the second target task training data. In terms of small-sample learning, this can better resolve the problem that an obtained model overfits training data and has poor performance in test data.” [0096] “inputting the target task labeled data into the memory model to obtain target task labeled feature data; and training the target task model based on the target task labeled feature data. ... at a meta testing stage, a training set is equivalent to the target task labeled data. There are five categories of images, and the images are used to define the model in a training process. After a training set (an image whose category is unknown) is input, which of the five categories on the left side in the figure the image belongs to is determined by the meta-SGD.” [0101] “The meta-SGD includes three fully connected layers. The support task model is a multi-sample classifier (Image Classifier), and includes only one fully connected layer (Fully Connected Layer, FC). The memory model uses a design the same as that of ResNet50, and a difference lies in that a last layer but one is used as an output layer.” [0104] “A specific design of a system framework is shown in FIG. 8, and a specific procedure is as follows: S1. Initialize a memory model (ResNet50), meta-SGD, and a multi-sample classifier, where initializing the meta-SGD includes initializing a neural network parameter θ and a learning rate α. ... then the learner classifies second target task training feature data of each task, and obtain a loss L test(τ)(ψ,θ′) based on a true label. The multi-sample classifier directly obtains a classification loss l(cϕ∘fψ(x),y) (a cross entropy) based on 64 input features, directly predicts classification information of all training data, and compares a prediction result with a true category to obtain the loss.  ... S01. The memory model (memory module) loads the trained parameter ψ. S02. Randomly select five categories from the test set, and randomly select one or several images from each of the five categories as target task labeled data, so that the trained meta-SGD obtains the learner. S03. Randomly select five categories from the test set, randomly select one or several images from each of the five categories as target task data, and input the target task data to obtain a learner, to obtain a prediction result.”)
Bin in view of Gidaris Fuhrer teaches the limitations: select the set of labeled data, and divide the set of labeled data into the training set and a test set according to image classes, wherein the training set and the test set do not overlap each other; (Gidaris, [P. 5, Section: 3.3] “ 3.3. Training procedure In order to learn the ConvNet-based recognition model (i.e. the feature extractor F(.|θ) as well as the classifier C(.|W∗ )) and the few-shot classification weight generator G(., .|φ), we use as the sole input a training set Dtrain = SKbase b=1 {xb,i} Nb i=1 of Kbase base categories. We split the training procedure into 2 stages and at each stage we minimize a different cross-entropy loss.” [P.6, Section: 4.1] “4.1. Mini-ImageNet experiments Evaluation setting for recognition of novel categories. We evaluate our few-shot object recognition system on the Mini-ImageNet dataset [25] that includes 100 different categories with 600 images per category, each of size 84×84. For our experiments we used the splits by Ravi and Laroche [16] that include 64 categories for training, 16 categories for validation, and 20 categories for testing. The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories in order to form few-shot tasks on which the trained model is evaluated. Those few-shot tasks are formed by first sampling Knovel categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories but do not overlap with the training examples.”) train the wide residual network for a predetermined number of times by using the training set; and test the trained wide residual network by using the test set; wherein the wide residual network comprises a multi-layer convolutional neural network and a fully connected layer; and in the pre-training process, after each image is input into the wide residual network, an output of the fully connected layer at the end of the wide residual network indicates a classification score of the input image being classified into each class, (Gidaris, [Pp. 5-6, Section: 3.3] “3.3. Training procedure In order to learn the ConvNet-based recognition model (i.e. the feature extractor F(.|θ) as well as the classifier C(.|W∗ )) and the few-shot classification weight generator G(., .|φ), we use as the sole input a training set Dtrain = SKbase b=1 {xb,i} Nb i=1 of Kbase base categories. We split the training procedure into 2 stages and at each stage we minimize a different cross-entropy loss of the following form: .... 1st training stage: During this stage we only learn the ConvNet recognition model without the few-shot classification weight generator. Specifically, at this stage we learn the parameters θ of the feature extractor F(.|θ) and the base classification weight vectors Wbase = {wb} Kbase b=1 . This is done in exactly the same way as for any other standard recognition model. In this case W∗ is equal to the base classification weight vectors Wbase. 2nd training stage: During this stage we train the learnable parameters φ of the few-shot classification weight generator while we continue training the base classification weight vectors Wbase (in our experiments during that training stage we freezed the feature extractor). In order to train the fewshow classification weight generator, in each batch we randomly pick Knovel “fake” novel categories from the base categories and we treat them in the same way as we will treat the actual novel categories after training. Specifically, instead of using the classification weight vectors in Wbase for those “fake” novel categories, we sample N′ training examples (typically N′ ≤ 5) for each of them, compute their feature vectors Z ′ = {z ′ i } N′ i=1, and give those feature vectors to the few-shot classification weight generator G(., .|φ) in order to compute novel classification weight generators. The inferred classification weight vectors are used for recognizing the “fake” novel categories. Everything is trained end-to-end. Note that we take care to exclude from the base classification weight vectors that are given as a second argument to the few-shot weight generator G(., .|φ) those classification vectors that correspond to the “fake” novel categories. In this case W∗ is the union of the “fake” novel classification weight vectors generated by G(., .|φ) and the classification weight vectors of the remaining base categories. More implementation details of this training stage are provided in §2 of supplementary material.” [P.6, Section: 4.1] “Evaluation setting for the recognition of the base categories. When we evaluate our model w.r.t. few-shot recognition task on the validation / test categories, we consider as base categories the 64 training categories on which we trained the model. Since the proposed few-shot object recognition system has the ability to not forget the base categories, we would like to also evaluate the recognition performance of our model on those base categories.” [P. 6, Section: 4.1.1] “The feature extractor used in all cases is a ConvNet model that has 4 convolutional modules, with 3 × 3 convolutions, followed by batch normalization, ReLU nonlinearity3 , and 2 × 2 max-pooling.” [p. 3, Section: 3, Col. 2] “ConvNet-based recognition model. It consists of (a) a feature extractor                                 
                                    F
                                    (
                                    .
                                    |
                                    θ
                                    )
                                
                             (with learnable parameters θ) that extracts a d-dimensional feature vector                                 
                                    z
                                     
                                    =
                                     
                                    F
                                    (
                                    x
                                    |
                                    θ
                                    )
                                     
                                    ∈
                                    
                                            R
                                        
                                            d
                                        
                             from an input image                                 
                                    x
                                
                            , and (b) a classifier                                 
                                    C
                                    (
                                    .
                                    |
                                    
                                            W
                                        
                                            *
                                        
                                    )
                                
                            , where                                 
                                    
                                            W
                                        
                                            *
                                        
                                    =
                                     
                                                            w
                                                        
                                                            k
                                                        
                                                            *
                                                        
                                                    ∈
                                                    
                                                            R
                                                        
                                                            d
                                                        
                                            k
                                            =
                                            1
                                        
                                                    K
                                                
                                                    *
                                                
                             are a set of                                 
                                    
                                            K
                                        
                                            *
                                        
                             classification weight vectors - one per object category, that takes as input the feature representation                                 
                                    z
                                
                             and returns a                                 
                                    
                                            K
                                        
                                            *
                                        
                             -dimensional vector with the probability classification scores                                 
                                    p
                                     
                                    =
                                     
                                    C
                                    (
                                    z
                                    |
                                    
                                            W
                                        
                                            *
                                        
                                    )
                                
                             of the                                 
                                    
                                            K
                                        
                                            *
                                        
                             categories.”) [Examiner’s Note: the ResNet represents a wide residual network and the last classification layer would represent a fully connected layer.]
While the combination of Bin, Gidaris, and Sumbul teaches the cross-entropy loss function for pre-training the residual network (ResNet) e.g.,  Eq(3) including the classification output provided the output layer. The combination of Bin, Gidaris, and Sumbul does not appear to explicitly teach: 
wherein, in the pre-training process, a loss function is defined as:                                 
                                    L
                                    =
                                     
                                            ∑
                                            
                                                i
                                            
                                            -
                                            
                                                    S
                                                
                                                    i
                                                    ,
                                                    y
                                                
                                            +
                                            
                                                    log
                                                
                                                ⁡
                                                
                                                                    ∑
                                                                    
                                                                        y
                                                                        '
                                                                    
                                                                            S
                                                                        
                                                                            i
                                                                            ,
                                                                            y
                                                                            '
                                                                        
                            , wherein                                 
                                    
                                            S
                                        
                                            i
                                            ,
                                            y
                                        
                             indicates a classification score of an                                 
                                    
                                            i
                                        
                                            t
                                            h
                                        
                             image to be classified being classified into a true class                                 
                                    y
                                
                             in each batch training, and                                 
                                    
                                            S
                                        
                                            i
                                            ,
                                            y
                                            '
                                        
                             indicates a classification score of the                                 
                                    
                                            i
                                        
                                            t
                                            h
                                        
                             image being classified into the other class                                 
                                    y
                                    '
                                
                            .
However, Wang, in combination with Bin, Gidaris, and Sumbul, teaches the limitation:
wherein, in the pre-training process, a loss function is defined as:                                 
                                    L
                                    =
                                     
                                            ∑
                                            
                                                i
                                            
                                            -
                                            
                                                    S
                                                
                                                    i
                                                    ,
                                                    y
                                                
                                            +
                                            
                                                    log
                                                
                                                ⁡
                                                
                                                                    ∑
                                                                    
                                                                        y
                                                                        '
                                                                    
                                                                            S
                                                                        
                                                                            i
                                                                            ,
                                                                            y
                                                                            '
                                                                        
                            , wherein                                 
                                    
                                            S
                                        
                                            i
                                            ,
                                            y
                                        
                             indicates a classification score of an                                 
                                    
                                            i
                                        
                                            t
                                            h
                                        
                             image to be classified being classified into a true class                                 
                                    y
                                
                             in each batch training, and                                 
                                    
                                            S
                                        
                                            i
                                            ,
                                            y
                                            '
                                        
                             indicates a classification score of the                                 
                                    
                                            i
                                        
                                            t
                                            h
                                        
                             image being classified into the other class                                 
                                    y
                                    '
                                
                            . (Wang, [Pp. 5-6, Section 3.1] For all episodes in a mini-batch, the softmax loss is: Lsoftmax = − Xk yS+1,k log P(Y∗ = yS+1,k|xS+1,k), where Y∗ is the predicted label for the query in the k-th episode, xS+1,k is the feature of the query, and yS+1,k is the ground truth label.) [Examiner’s Notes: the claimed loss function is interpreted as the classification loss function based on softmax and cross-entropy loss for training the classifier as described by Wang.]
 	Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Bin, Gidaris, Sumbul, and Wang before them, to incorporate the classification loss function as taught by Wang. One would have been motivated to make such a combination in order to improve the performance of existing models substantially with very little computational overhead (Wang [Abstract]).
Claim(s) 14 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bin, Gidaris, and Sumbul as outlined above, and further in view of Sun et al., (NPL: “Meta-Transfer Learning for Few-Shot Learning,” (2018)).
Regarding Original Claim 14, the combination of Bin, Gidaris, and Sumbul teaches the elements of claim 8 as outlined above, and further teaches:
While Bin, in combination with Gidaris and Sumbul, teaches the process of training meta learners for a small-sample task of n-categories using test samples, by extracting information from the second  target task training data (i.e., test set), and finally the learner completes prediction for test data in the small-sample tasks (obtains a classification loss and directly predicts classification information of all training data, and compares a prediction result with a true category to obtain the loss), and update the learner’s parameters based on the loss. 
the combination of Bin, Gidaris, and Sumbul does not appear to explicitly teach:
updating the parameter of the N-class classifier according to a result of the prediction, wherein a loss function used in the training process of the N-class classifier is the same as that used in the pre-training process of the wide residual network except for a number of image classes involved.
However, Sun, in combination with Bin, Gidaris, and Sumbul, teaches the limitations:
train the N-class classifier after the N-class classifier is obtained, (Sun, [P. 3, Section: 3] “Meta-learning consists of two phases: meta-train and meta-test. A meta-training example is a classification task T sampled from a distribution p(T ). T is called episode, including a training split T (tr) to optimize the base-learner, and a test split T (te) to optimize the meta-learner. In particular, meta-training aims to learn from a number of episodes {T } sampled from p(T ). An unseen task Tunseen in meta test will start from that experience of the meta-learner and adapt the base-learner. The final evaluation is done by testing a set of unseen datapoints T (te) unseen.” [P. 3, Section: 3] “In each episode, metatraining has a two-stage optimization. Stage-1 is called base-learning, where the cross-entropy loss is used to optimize the parameters of the base-learner. Stage-2 contains a feed-forward test on episode test datapoints. The test loss is used to optimize the parameters of the meta-learner. Specifically, given an episode T ∈ p(T ), the base-learner θT is learned from episode training data T (tr) and its corresponding loss LT (θT , T (tr) ). After optimizing this loss, the baselearner has parameters ˜θT . ... Meta-test phase. This phase aims to test the performance of the trained meta-learner for fast adaptation to unseen task. Given Tunseen, the meta-learner ˜θT teaches the baselearner θTunseen to adapt to the objective of Tunseen by some means, e.g. through initialization [7]. Then, the test result on T (te) unseen is used to evaluate the meta-learning approach. If there are multiple unseen tasks {Tunseen}, the average result on {T (te) unseen} will be the final evaluation.”) select a number of images from each class of the N classes as images to be tested; (Sun,, [Algorithm 2] “1 Sample training datapoints T (tr) and test datapoints T (te) from T ;” [P. 6, Section: 5.1] “Specifically, 1) we consider the 5-class classification and 2) we sample 5-class, 1-shot (5-shot or 10-shot) episodes to contain 1 (5 or 10) samples for train episode, and 15 (uniform) samples for episode test. Note that in the state-ofthe-art work [30], 32 and 64 samples are respectively used in 5-shot and 10-shot settings for episode test. In total, we sample 8k tasks for meta-training (same for w/ or w/o HT meta-batch), and respectively sample 600 random tasks for meta-validation and meta-test.”) extract feature vectors of the images to be tested by using the feature extractor; (Sun, [Algorithm 2] “Input: Task T , learning rates β and γ, feature extractor Θ, base learner θ, Scaling and Shifting parameters ΦS{1,2} Output: Base learner θ, Scaling and Shifting parameters ΦS{1,2} , the worst classified class-m in T” See Fig. 2 (c) meta-test.) input the feature vectors extracted directly into the N-class classifier, so as to predict classification scores of the images to be tested being classified into each class; (Sun, [P. 3, Section: 3] “Meta-test phase. This phase aims to test the performance of the trained meta-learner for fast adaptation to unseen task. Given Tunseen, the meta-learner ˜θT teaches the baselearner θTunseen to adapt to the objective of Tunseen by some means, e.g. through initialization [7]. Then, the test result on T (te) unseen is used to evaluate the meta-learning approach. If there are multiple unseen tasks {Tunseen}, the average result on {T (te) unseen} will be the final evaluation.” [P. 4, Section: 4.2] “In the following, we detail the SS operations. Given a task T , the loss of T (tr) is used to optimize the current base-learner (classifier) θ ′ by gradient descent: ... ΦS1 is initialized by ones and ΦS1 by zeros. Then, they are optimized by the test loss of T (te) as follows, ...” [P. 5, Algorithm 2, Line 6] “Optimize ΦS{1,2} and θ by Eq. 4 and Eq. 5 (using T (te) ) ;”) and updating the parameter of the N-class classifier according to a result of the prediction, wherein a loss function used in the training process of the N-class classifier is the same as that used in the pre-training process of the wide residual network except for a number of image classes involved. (Sun, [P. 3, Section: 3] “Meta-training phase. This phase aims to learn a meta-learner from multiple episodes. In each episode, meta-training has a two-stage optimization. Stage-1 is called base-learning, where the cross-entropy loss is used to optimize the parameters of the base-learner. Stage-2 contains a feed-forward test on episode test datapoints. The test loss is used to optimize the parameters of the meta-learner. …, After optimizing this loss, the base-learner has parameters                         
                            
                                            θ
                                        
                                            T
                                        
                                ~
                            
                     . Then, the meta-learner is updated using test loss                         
                            
                                    L
                                
                                    T
                                
                            (
                             
                                            θ
                                        
                                        ~
                                    
                                    T
                                
                            ,
                            
                                    T
                                
                                    (
                                    t
                                    e
                                    )
                                
                            )
                        
                    . After meta-training on all episodes, the meta-learner is optimized by test losses                          
                            
                                    {
                                    
                                            L
                                        
                                            T
                                        
                                    (
                                     
                                                    θ
                                                
                                                ~
                                            
                                            T
                                        
                                    ,
                                    
                                            T
                                        
                                                    t
                                                    e
                                                
                                    )
                                    }
                                
                                    T
                                    ∈
                                    p
                                    (
                                    T
                                     
                                    )
                                
                    . Therefore, the number of meta-learner updates equals to the number of episodes.” [P. 4, Section: 4.1] “This phase is similar to the classic pre-training stage as, e.g., pre-training on Imagenet for object recognition [35]. …, Specifically, for a particular few-shot dataset, we merge all-class data D for pretraining. For instance, for miniImageNet [45], there are totally 64 classes in the training split D and each class contains 600 samples, which we use to pre-train a 64-class classifier. We first randomly initialize a feature extractor Θ (e.g. CONV layers in ResNets [14]) and a classifier θ (e.g. the last FC layer in ResNets [14]), and then optimize them by gradient descent as follows, ….., e.g. cross-entropy loss, and α denotes the learning rate.” [P.4, Section: 4.2] 4.2. Meta-transfer learning (MTL): In the following, we detail the SS operations. Given a task T , the loss of T(tr) is used to optimize the current base-learner (classifier) θ′ by gradient descent: …(3) which is different to Eq. 1, as we do not update Θ. Note that here θ is different to the one from the previous phase, the large-scale classifier θ in Eq. 1. The new θ concerns only a few of classes, e.g. 5 classes in miniImageNet [45], to classify in a novel few-shot setting.”) [Note: Sun describes the use of the cross-entropy loss function for both the pre-training phase (on a 64-class classifier using large-scale data) and the meat-transfer learning phase (e.g., 5-class) in the few shot learning. The examiner interprets this as to correspond to the claim limitation.”]
 	Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Bin, Gidaris, Sumbul, and Sun before them, to incorporate the meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks as taught by Sun. One would have been motivated to make such a combination in order to enable deep neural nets converge faster while reducing the probability to overfit when using few labeled training data only (Sun [Intro]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SADIK ALSHAHARI whose telephone number is (703)756-4749. The examiner can normally be reached Monday - Friday, 9 a.m. 6 p.m. ET.
Examiner interviews are available via telephone, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.A.A./Examiner, Art Unit 2121                                                                                                                                                                                                        
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Aug 11, 2021
Application Filed
Dec 02, 2024
Non-Final Rejection — §103, §112
Feb 19, 2025
Response Filed
Apr 17, 2025
Final Rejection — §103, §112
Jul 17, 2025
Request for Continued Examination
Jul 22, 2025
Response after Non-Final Action
Jan 09, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/358,725
Patent 12596930
SENSOR COMPENSATION USING BACKPROPAGATION
2y 5m to grant Granted Apr 07, 2026
17/186,392
Patent 12493786
Visual Analytics System to Assess, Understand, and Improve Deep Neural Networks
2y 5m to grant Granted Dec 09, 2025
17/156,821
Patent 12462199
ADAPTIVE FILTER BASED LEARNING MODEL FOR TIME SERIES SENSOR SIGNAL CLASSIFICATION ON EDGE DEVICES
2y 5m to grant Granted Nov 04, 2025
17/157,319
Patent 12437199
Activation Compression Method for Deep Learning Acceleration
2y 5m to grant Granted Oct 07, 2025
17/449,287
Patent 12430552
Processing Data Batches in a Multi-Layer Network
2y 5m to grant Granted Sep 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

3-4
Expected OA Rounds
35%
Grant Probability
82%
With Interview (+47.1%)
4y 5m
Median Time to Grant
High
PTA Risk
Based on 34 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD OF GENERATING CLASSIFIER BY USING SMALL NUMBER OF LABELED IMAGES

This examiner grants 35% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD OF GENERATING CLASSIFIER BY USING SMALL NUMBER OF LABELED IMAGES

This examiner grants 35% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email