Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is in response to the application filed 13 January 2026. Claims 1 and 11 are amended. Claims 1-20 are pending and have been examined.
Response to Arguments
Applicant' s arguments, see pages 8-10, filed 13 January 2026, with respect to the rejection of Claims 1-20 are rejected under 35 U.S.C. 101 have been fully considered but they are not persuasive.
APPLICANT'S ARGUMENT: Applicant argues (page 9, paragraph 4) that "the independent claims now clearly recite a specific technological improvement, namely enabling the UE to perform continual few-shot learning locally rather than offloading image data or model parameters to a cloud server or other external node. This provides several recognized technical benefits, including reduced network communication, lower latency, improved privacy, and enhanced performance in low-connectivity environments."
Applicant argues (page 9, paragraph 5) that "By explicitly requiring on-device continual learning without transmitting sample images or model parameters externally, the amended claims recite subject matter that is not well-understood, routine, or conventional."
EXAMINER'S RESPONSE: Examiner respectfully disagrees. In applying the steps of the Alice/Mayo test to the limitations of amended Claim 1 and their combinations, the claim appears to be directed to the recited abstract ideas. The steps of generating classification weights, applying linear transformations, and performing classification appear to be mental process steps that could be performed in the human mind at the claimed level of generality. The additional elements recited by the claim, including the processing equipment, receiving data, and updating models, appear to pertain to the computing machinery itself, insignificant solution activity, and/or to well-understood techniques of the art. The additional elements pertaining to computer vision and image data appear merely to link the mental process steps to a particular field of use incidentally, rather than providing a practical application for the mental process or providing significantly more than the mental process steps. Since Claim 1 has been found to be directed to the mental process steps, it is therefore ineligible subject matter.
Applicant's arguments, see pages 10-12, filed 13 January 2026, with respect to the rejection of Claims 1-6, 8, 11-16, and 18 under 35 U.S.C. 102(a)(1) and Claims 7, 9, 10, 17, 19, and 20 under 35 U.S.C. 103 have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
APPLICANT'S ARGUMENT: Applicant argues (page 10, paragraph 5) that "Gidaris does not employ 'attention scores computed between the query and key representations' when generating the second classification weights, and does not employ 'a weighted combination of the value representations' when generating the second classification weights."
Applicant argues (page 10, paragraph 6) that "Gidaris does not disclose 'attention scores computed between the query and key representations.' Claim 1 requires (i) computing query representations, (ii) computing key representations, and (iii) computing attention scores between the query and key representations."
Applicant argues (page 10, paragraph 7) that "Gidaris does not ... compute attention scores based on query and key representations. Instead, Gidaris generates an 'attention-based classification vector' by applying predetermined learned weighting parameters to two fixed classification representations, w'_ avg and w'_ att."
Applicant argues (page 11, paragraph 2) that "Gidaris does not disclose 'a weighted combination of the value representations.' ... Gidaris does not generate value representations, nor does it apply attention-score-derived weights to any such value representations. ... ¶ Therefore, Gidaris fails to disclose 'generate the second classification weights based on attention scores computed between the query and key representations, and a weighted combination of the value representations,' as recited in Claims 1 and 11. Thus, Claims 1 and 11 are patentable over Gidaris."
EXAMINER'S RESPONSE: Examiner notes that Applicant's arguments are moot. Amended Claim 1 is now rejected under 35 U.S.C. 103 as obvious in view of Gidaris in view of Liu in view of Kim.
In the rejection of amended Claim 1 below, Gidaris is relied on to teach computing query, key, and value representations from first classification weights. Liu is relied on to teach applying linear transformations to compute query, key, and value representations from feature vectors of the second classes and first classification weights.
Examiner further notes that features upon which applicant relies (i.e., learned linear transformations) are not recited in the rejected claim. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1
Step 1
Claim 1 recites a method, and thus the claimed process falls within a statutory category of invention.
Step 2A Prong 1
The claim recites generating ... a few-shot learning model for a base task with base classification weights for base classes of the base task, which is a mental process. The claim recites generating ... first classification weights for first classes of the first task based on the base classification weights, which is a mental process. The claim recites generating ... second classification weights for second classes of the second task ... wherein generating the second classification weights ... are ... to enable continual learning, which is a mental process. The claim recites apply linear transformations to compute query, key and value representations from feature vectors of the second classes and the first classification weights, which is a mental process. The claim recites generate the second classification weights based on attention scores computed between the query and key representations, and a weighted combination of the value representations, which is a mental process. The claim recites classifying ... a first set ... of the second task into the second classes ... wherein ... classifying the first set ... of the second task are ... to enable continual learning, which is a mental process. The claim recites determining ... patterns in the classified first set ... to make inferences, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2
The additional elements by a processor of a user equipment (UE), by a weight generator of the processor, using the few-shot learning model, performed at the UE, and at the UE without transmitting sample images or model parameters to an external device invoke a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a first task amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element updating ... the few-shot learning model with the first classification weights for sample ... classification into the first classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a second task amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element receive, as input, the base classification weights and the first classification weights amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element updating ... the few-shot learning model with the second classification weights for sample ... classification into the first classes ... wherein ... updating the few-shot learning model with the second classification weights ... are ... to enable continual learning invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional elements sample images, image classification, and in a computer vision application do not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment").
Step 2B
The additional elements by a processor of a user equipment (UE), by a weight generator of the processor, using the few-shot learning model, performed at the UE, and at the UE without transmitting sample images or model parameters to an external device invoke a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a first task is well-understood, routine, conventional activity (see MPEP 2106.05(d), "receiving or transmitting data over a network"). The additional element updating ... the few-shot learning model with the first classification weights for sample ... classification into the first classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a second task is well-understood, routine, conventional activity (see MPEP 2106.05(d), "receiving or transmitting data over a network"). The additional element receive, as input, the base classification weights and the first classification weights is well-understood, routine, conventional activity (see MPEP 2106.05(d), "receiving or transmitting data over a network"). The additional element updating ... the few-shot learning model with the second classification weights for sample ... classification into the first classes ... wherein ... updating the few-shot learning model with the second classification weights ... are ... to enable continual learning invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional elements sample images, image classification, and in a computer vision application do not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 2
Step 1
Regarding Claim 2, the rejection of Claim 1 is incorporated
Step 2A Prong 1
Claim 2 recites the abstract ideas recited by parent Claim 1.
Step 2A Prong 2, Step 2B
The additional element training the weight generator using a random number of the base classes and a fake task of fake classes selected from the classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element training the weight generator using a fixed number of fake tasks of the fake classes selected from the base classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 3
Step 1
Regarding Claim 3, the rejection of Claim 2 is incorporated.
Step 2A Prong 1
The claim recites determining an average cross-entropy loss using randomly selected sample images from classes used to train the weight generator, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 4
Step 1
Regarding Claim 4, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
Claim 4 recites the abstract ideas recited by parent Claim 1.
Step 2A Prong 2, Step 2B
The additional element wherein the few-shot learning model comprises a feature extractor invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 5
Step 1
Regarding Claim 5, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites wherein the second task further comprises a second set of sample images that are classified into classes, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 6
Step 1
Regarding Claim 6, the rejection of Claim 5 is incorporated.
Step 2A Prong 1
The claim recites wherein generating the second classification weights comprises: extracting features from the second set of sample images of the second task, which is a mental process. The claim recites generating the second classification weights by the weight generator using the extracted features, the base classification weights, and the first classification weights, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 7
Step 1
Regarding Claim 7, the rejection of Claim 6 is incorporated.
Step 2A Prong 1
The claim recites generating, by the weight generator, third classification weights for third classes of the third task based on the base classification weights, the first classification weights, and the second classification weights, which is a mental process. The claim recites classifying, by the processor, a third set of sample images of the third task into the third classes using the few-shot learning model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2
The additional element receiving, by the processor, a third task amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element updating, by the processor, the few-shot learning model with the third classification weights for sample image classification into the third classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
Step 2B
The additional element receiving, by the processor, a third task is well-understood, routine, conventional activity (see MPEP 2106.05(d), "receiving or transmitting data over a network"). The additional element updating, by the processor, the few-shot learning model with the third classification weights for sample image classification into the third classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 8
Step 1
Regarding Claim 8, the rejection of Claim 5 is incorporated.
Step 2A Prong 1
The claim recites wherein generating the second classification weights comprises: extracting features from the second set of sample images of the second task, which is a mental process. The claim recites generating the second classification weights by the weight generator using the extracted features and classification weights of classes selected from the base classes and the first classes of the first task, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 9
Step 1
Regarding Claim 9, the rejection of Claim 8 is incorporated.
Step 2A Prong 1
The claim recites wherein, a random number of the classes is selected for the classification weights that are used to generate the second classification weights, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 10
Step 1
Regarding Claim 10, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
Claim 10 recites the abstract ideas recited by parent Claim 1.
Step 2A Prong 2, Step 2B
The additional element wherein the weight generator is a bi-attention weight generator invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element applying a first linear transformation weight for a query of a bi-attention module to extracted features from the first set of sample images invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element applying a second linear transformation weight for a key of the bi-attention module to the base classification weights and the first classification weights invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element applying a third linear transformation weight for a value of the bi-attention module to the base classification weights and the first classification weights invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 11
Step 1
Claim 11 recites a user equipment (UE), and thus the claimed and thus the claimed manufacture falls within a statutory category of invention.
Step 2A Prong 1
The claim recites generate ... a few-shot learning model for a base task with base classification weights for base classes of the base task, which is a mental process. The claim recites generate ... first classification weights for first classes of the first task based on the base classification weights, which is a mental process. The claim recites generate ... second classification weights for second classes of the second task ... wherein generating the second classification weights ... are ... to enable continual learning, which is a mental process. The claim recites apply linear transformations to compute query, key and value representations from feature vectors of the second classes and the first classification weights, which is a mental process. The claim recites generate the second classification weights based on attention scores computed between the query and key representations, and a weighted combination of the value representations, which is a mental process. The claim recites classify ... a first set ... of the second task into the second classes ... wherein ... classifying the first set ... of the second task are ... to enable continual learning, which is a mental process. The claim recites determine ... patterns in the classified first set ... to make inferences, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2
The additional elements a processor; and a non-transitory computer readable storage medium storing instructions invoke a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional elements by a processor of a user equipment (UE), by a weight generator of the processor, using the few-shot learning model, performed at the UE, and at the UE without transmitting sample images or model parameters to an external device invoke a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a first task amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element updating ... the few-shot learning model with the first classification weights for sample ... classification into the first classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a second task amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element receive, as input, the base classification weights and the first classification weights amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element updating ... the few-shot learning model with the second classification weights for sample ... classification into the first classes ... wherein ... updating the few-shot learning model with the second classification weights ... are ... to enable continual learning invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional elements sample images, image classification, and in a computer vision application do not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment").
Step 2B
The additional elements a processor; and a non-transitory computer readable storage medium storing instructions invoke a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional elements by a processor of a user equipment (UE), by a weight generator of the processor, using the few-shot learning model, performed at the UE, and at the UE without transmitting sample images or model parameters to an external device invoke a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a first task is well-understood, routine, conventional activity (see MPEP 2106.05(d), "receiving or transmitting data over a network"). The additional element updating ... the few-shot learning model with the first classification weights for sample ... classification into the first classes invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving ... a second task is well-understood, routine, conventional activity (see MPEP 2106.05(d), "receiving or transmitting data over a network"). The additional element receive, as input, the base classification weights and the first classification weights is well-understood, routine, conventional activity (see MPEP 2106.05(d), "receiving or transmitting data over a network"). The additional element updating ... the few-shot learning model with the second classification weights for sample ... classification into the first classes ... wherein ... updating the few-shot learning model with the second classification weights ... are ... to enable continual learning invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional elements sample images, image classification, and in a computer vision application do not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Claims 12-20, dependent on Claim 11, incorporate the rejection of Claim 11. Claims 12-20 incorporate substantively all the limitations of Claims 2-10, respectively, and are rejected under the same rationale.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 8, 11-16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gidaris et al., "Dynamic few-shot visual learning without forgetting" (hereinafter "Gidaris") in view of Liu, et al., "A Universal Representation Transformer Layer for Few-Shot Image Classification" (hereinafter "Liu") in view of Kim, et al. (US 2020/0242476 A1, hereinafter "Kim").
Regarding Claim 1, Gidaris teaches:
a method (Gidaris, p. 2, column 2, "we describe our few-shot object learning methodology in §3") comprising:
generating ... a few-shot learning model (Gidaris, p. 6, 3.3. Training procedure: "In order to learn the ConvNet-based recognition model (i.e. the feature extractor ... as well as the classifier) ... and the few-shot classification weight generator ..., we use as the sole input a training set ... of ... base categories," where Gidaris's feature extractor, classifier, and weight generator correspond to the instant few-shot learning model, as depicted in the dotted box of Gidaris, p. 4, Figure 1) for a base task with base classification weights for base classes of the base task (Gidaris, p. 7, 3.3. Training procedure, 1st training stage: "During this stage we only learn the ConvNet recognition model without the few-shot classification weight generator. Specifically, at this stage we learn the parameters ... of the feature extractor ... and the base classification weight vectors," where Gidaris's model trained during the 1st training stage corresponds to the instant model for a base task), by a processor of a user equipment (UE) (Gidaris, p. 1, column 1, Abstract, "The code and models of our paper will be published on: https://github.com/gidariss/FewShotWithoutForgetting," indicating implementation using a computer, in which a processor is inherent);
receiving, by the processor, a first task (Gidaris, p. 7, 3.3. Training procedure, 2nd training stage: "During this stage we train the learnable parameters ... of the few-shot classification weight generator while we continue training the base classification weight vectors.... [I]n each batch we randomly pick ...'fake' novel categories from the base categories and we treat them in the same way as we will treat the actual novel categories after training," where Gidaris's 2nd training stage for a fake category corresponds to the instant first task);
generating, by a weight generator of the processor, first classification weights for first classes of the first task (Gidaris, p. 7, 3.3. Training procedure, 2nd training stage: "In order to train the few-show [sic] classification weight generator, in each batch we randomly pick ... 'fake' novel categories from the base categories and we treat them in the same way as we will treat the actual novel categories after training. Specifically, instead of using the classification weight vectors in
W
b
a
s
e
[i.e., the classification weight vectors of the base categories] for those 'fake' novel categories, we sample ... training examples ..., compute their feature vectors ..., and give those feature vectors to the few-shot classification weight generator ... in order to compute novel classification weight generators," where Gidaris's fake novel categories correspond to the instant first classes and where Gidaris's computed novel classification weight generators correspond to the instant generated first classification weights) based on the base classification weights (Gidaris, p. 7, 3.3. Training procedure, 2nd training stage: "In this case
W
*
[i.e., the classification weight vectors parameterizing the classifier] is the union of the 'fake' novel classification weight vectors generated by
G
.
,
.
ϕ
[i.e., the few-shot classification weight generator] and the classification weight vectors of the remaining base categories");
updating, by the processor, the few-shot learning model with the first classification weights based on the base classification weights (Gidaris, p. 7, 3.3. Training procedure, 2nd training stage: "Note that we take care to exclude from the base classification weight vectors that are given as a second argument to the few-shot weight generator ... those classification vectors that correspond to the 'fake' novel categories. In this case
W
*
[i.e., the classification weight vectors parameterizing the classifier] is the union of the 'fake' novel classification weight vectors generated by
G
.
,
.
ϕ
[i.e., the few-shot classification weight generator] and the classification weight vectors of the remaining base categories," where Gidaris's union of the classification weight vectors of the 'fake' novel categories and the remaining base categories corresponds to the instant updating) for sample image classification into the first classes (Gidaris, p. 11, Appendix A. Implementation details of training procedure followed during the 2nd training stage: "we use the union of 'fake' novel classification weight vectors and the classification weight vectors of the remaining base categories in order to learn to classify the ... test image examples");
receiving, by the processor, a second task; (Gidaris, p. 7, column 1: "We evaluate our few-shot object recognition system on the Mini-ImageNet dataset.... For our experiments we used the splits by Ravi and Laroche [17] that include 64 categories for training, 16 categories for validation, and 20 categories for testing," where Gidaris's object recognition for a testing category corresponds to the instant second task);
generating. by the weight generator, second classification weights for second classes of the second task (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories in order to form few-shot tasks on which the trained model is evaluated. Those few-shot tasks are formed by first sampling ... categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories but do not overlap with the training examples," where Gidaris's metalearning test categories corresponds to the instant generating second classification weights, as in p. 3, 3. Methodology, Few-shot classification weight generator: "for each novel category ..., the few-shot classification weight generator ... gets as input the feature vectors ... of its ... training examples, ... and the classification weight vectors of the base categories ... and generates a classification weight vector ... for that novel category"), wherein the weight generator is configured to:
receive, as input, the base classification weights (Gidaris, p. 6, 3.2. Few-shot classification weight generator: "The few-shot classification weight generator ... gets as input the feature vectors ... of the ... training examples of a novel category ... and (optionally) the classification weight vectors of the base categories
W
b
a
s
e
") and the first classification weights (Gidaris, p. 7, 3.3. Training procedure: "The meaning of
W
*
is different on each of the training stages, as we explain below. ... 1st training stage: ... In this case
W
*
is equal to the base classification weight vectors
W
b
a
s
e
. ... 2nd training stage: ... In this case
W
*
is the union of the 'fake' novel classification weight vectors generated by
G
.
,
.
ϕ
and the classification weight vectors of the remaining base categories," where Gidaris's base-category classification weights handled as 'fake' weight vectors corresponds to the instant first classification weights);
... compute query, key and value representations (Gidaris, p. 6, 3.2. Fewshot classification weight generator, Attention-based weight inference: "an extra attention-based classification weight vector
w
a
t
t
'
is computed as:
w
a
t
t
'
=
1
N
'
∑
i
=
1
N
'
∑
b
=
1
K
b
a
s
e
A
t
t
ϕ
q
z
-
i
'
,
k
b
⋅
w
-
b
(7)
where
ϕ
q
∈
R
d
×
d
is a learnable weight matrix that transforms the feature vector
z
-
i
'
to query vector used for querying the memory,
{
k
b
∈
R
d
}
b
K
b
a
s
e
is a set of
K
b
a
s
e
learnable keys (one per base category) used for indexing the memory," where Gidaris's
ϕ
q
z
-
i
'
term,
k
b
term, and
w
-
b
term correspond to the instant query, key, and value representations) from ... the first classification weights (Gidaris, p. 6, 3.2. Few-shot classification weight generator, Attention-based weight inference, Eq. 7, above, where
w
-
b
corresponds to the instant base/first classification weights), and
generate the second classification weights based on attention scores computed between the query and key representations, and a weighted combination of the value representations (Gidaris, p. 6, 3.2. Few-shot classification weight generator, Attention-based weight inference: "The final classification weight vector is computed as a weighted sum of the average based classification vector
w
'
a
v
g
and the attention based classification vector
w
a
t
t
'
:
w
'
=
ϕ
a
v
g
⊙
w
'
a
v
g
+
ϕ
a
t
t
⊙
w
'
a
t
t
(8)
where
⊙
is the Hadamard product, and
ϕ
a
v
g
,
ϕ
a
t
t
∈
R
d
are learnable weight vectors," where Gidaris's
w
'
term,
w
a
t
t
'
term, and
ϕ
a
t
t
⊙
w
'
a
t
t
term correspond to the instant second classification weights based on attention scores, query and key representation, and weighted combination, respectively);
updating, by the processor, the few-shot learning model (Gidaris, p. 3, 3. Methodology: "if
W
n
o
v
e
l
... are the classification weight vectors of the novel categories inferred by the few-shot weight generator, then by setting
W
*
=
W
b
a
s
e
⋃
W
n
o
v
e
l
on the classifier ... we enable the ConvNet model to recognize both base and novel categories," where Gidaris's weight union operation corresponds to the instant updating) with the second classification weights (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories in order to form few-shot tasks on which the trained model is evaluated. Those few-shot tasks are formed by first sampling ... categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories but do not overlap with the training examples," where Gidaris's metalearning test categories corresponds to the instant updating with second classification weights) for sample image classification into the second classes (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "We evaluate our few-shot object recognition system on the Mini-ImageNet dataset ... with 600 images per category... For our experiments we used ... 20 categories for testing," where Gidaris's 20 test categories corresponds to the instant second classes);
classifying, by the processor, a first set of sample images of the second task (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "We evaluate our few-shot object recognition system on the Mini-ImageNet dataset ... with 600 images per category... For our experiments we used ... 20 categories for testing," where Gidaris's evaluating a few-shot object recognition system corresponds to the instant classifying, where Gidaris's 20 test categories of images corresponds to the instant second task, and where Gidaris's sampled set of novel categories corresponds to the instant first set, as in: p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "Those few-shot tasks are formed by first sampling Knovel categories and one or five training example per category) into the second classes using the few-shot learning model (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "Those few-shot tasks are formed by first sampling
K
n
o
v
e
l
categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories," where Gidaris's novel categories corresponds to the instant second classes); and
determining, by the processor, patterns in the classified first set of sample images (Gidaris, p. 2, 1. Introduction, Few-shot classification-weight generator based on attention: "A typical ConvNet based recognition model, in order to classify an image, first extracts a high level feature representation from it," where Gidaris's feature extraction corresponds to the instant determining patterns) to make inferences in a computer vision application (Gidaris, p. 2, 1. Introduction, Few-shot classification-weight generator based on attention: "the first technical novelty of our work is that we enhance a typical object recognition system with an extra component, called few-shot classification weight generator that accepts as input a few training examples of a novel category," where Gidaris's object recognition for an image classification system corresponds to the instant inferences in a computer vision application),
wherein generating the second classification weights, updating the few-shot learning model with the second classification weights, and classifying the first set of sample images of the second task are performed at the UE to enable continual learning at the UE ... (Gidaris, p. 1, 1. Introduction: "the human visual system exhibits the remarkably ability to be able to effortlessly learn novel concepts from only one or a few examples and reliably recognize them later on. ... Mimicking that behavior on artificial vision systems is an interesting and very challenging research problem with many practical advantages, such as developing real-time interactive vision applications for portable devices (e.g., cell-phones). ¶ ... [M]ost prior methods neglect to fulfill two very important requirements for a good few-shot learning system: (a) the learning of the novel categories needs to be fast, and (b) to not sacrifice any recognition accuracy on the initial categories that the ConvNet was trained on," where Gidaris's learning for real-time interactive vision applications for cell phones reasonably suggests the instant continual learning at the UE).
Gidaris teaches computing query, key and value representations from the first classification weights.
Gidaris may not explicitly teach apply linear transformations to compute query, key and value representations from feature vectors of the second classes and ... first classification weights.
However, Liu explicitly teaches:
apply linear transformations to compute query (Liu, p. 4, 3.1 Single-Head URT Layer: "Queries
q
c
: For each class
c
, we obtain a query through
q
c
=
W
q
r
i
S
c
+
b
q
, where we have a learnable query linear transformation represented by matrix
W
q
and bias
b
q
," ), key (Liu, p. 5, 3.1 Single-Head URT Layer: "Keys
k
i
,
c
: For each domain
i
and class
c
, we define keys as
k
i
,
c
=
W
k
r
i
S
c
+
b
k
, using a learnable linear transformation
W
k
and
b
k
and where
r
i
S
c
=
1
/
S
c
∑
x
∈
S
c
r
i
x
, using a similar notation as for
r
S
c
") and value representations (Liu, p. 5, 3.1 Single-Head URT Layer: "Attention scores
α
: as for regular Transformer layers, we use scaled dot-product attention ... [Eq. 4] ... where
l
is the dimensionality of the keys and queries. ... Equipped with these attention scores, the URT layer can now produce an adapted representation for the task (for the support and query set examples) by computing
ϕ
x
=
∑
i
α
i
r
i
x
(6)
," where
α
corresponds to the key and query values and
r
i
corresponds to the instant value) from feature vectors of the second classes and ... first classification weights (Liu, query term
r
i
S
c
, key term
r
i
S
c
, and value term
r
i
x
above, where
r
x
corresponds to the calculated second-class feature vectors, as in p. 4, 3.1 Single-Head URT Layer: "Let
r
i
x
be the output vector of the backbone for domain
i
" and
S
c
x
is a support set of a few-shot class corresponds to the instant second classes, and where the parameters of Liu's pre-trained backbone neural networks correspond to the instant first weights).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gidaris regarding computing query, key and value representations from the first classification weights with those of Liu regarding apply linear transformations to compute query, key and value representations from feature vectors of the second classes and ... first classification weights.
The motivation to do so would be to facilitate training and inference of multi-domain models using multi-domain backbones (Liu, p. 3, 2.2 Background and Related Work, Transformer Networks: "Our model structure is inspired by the structure of the dot-product self-attention in the Transformer, which we adapted here to multidomain few-shot learning by designing appropriate parametrizations for queries, keys and values. Self-attention was explored in the single-domain training regime ... where each representation of individual examples in a task support set is influenced by all other examples. ... Rather than using self-attention between individual examples in the support set, our model uses self-attention to select between different domain-specific backbones" and p. 2, 1 Introduction, Present work: "we propose the Universal Representation Transformer (URT) layer, which can effectively learn to transform a universal representation into task-adapted representations. The URT layer is inspired from Transformer networks [23] and uses an attention mechanism to learn to retrieve or blend the appropriate backbones to use for each task. By training this layer across few-shot tasks from many domains, it can support transfer across these tasks").
The Gidaris/Liu combination teaches generating the second classification weights, updating the few-shot learning model with the second classification weights, and classifying the first set of sample images of the second task are performed at the UE to enable continual learning at the UE.
The Gidaris/Liu combination may not explicitly teach enable continual learning at the UE without transmitting sample images or model parameters to an external device.
However, Kim teaches:
enable continual learning at the UE without transmitting sample images or model parameters to an external device (Kim, Claim 1: "A method for on-device continual learning of a neural network which analyzes input data" and [0106]-[0108]: "The present disclosure has an effect of enabling on-device learning on a local device without storing past training data by efficiently generating the past training data used for previous learning. ¶ The present disclosure has another effect of preventing catastrophic forgetting of the neural network when learning, by on-device learning of the neural network using new training data and the past training data generated by the data generator network. ¶ The present disclosure has still another effect of saving resources such as storage, and securing privacy," where the instant specification positively describes offloading of processing to an external device at [00079], depicted by the UE as electronic device 501 of Fig. 5 of the instant specification).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Gidaris/Liu combination regarding generating the second classification weights, updating the few-shot learning model with the second classification weights, and classifying the first set of sample images of the second task are performed at the UE to enable continual learning at the UE with those of Kim regarding enable continual learning at the UE without transmitting sample images or model parameters to an external device.
The motivation to do so would be to facilitate continual learning in scenarios where privacy concerns or operational constraints do not permit communication with a training server (Kim, [0007]: "it is impossible to perform learning on the servers in a personal mobile device environment where personal data cannot be transmitted to the servers for learning purposes due to privacy concerns, or in environments of a military, a drone, or a ship where the device is often out of the communication network" and Abstract: "A method for on-device continual learning of a neural network which analyzes input data is provided to be used for smartphones, drones, vessels, or a military purpose").
Regarding Claim 11, Gidaris teaches:
a user equipment (UE) comprising: a processor; and a non-transitory computer readable storage medium storing instructions that, when executed, cause the processor to perform precisely the method of Claim 1. As Gidaris implemented their method in code (Gidaris, p. 1, column 1, Abstract: "The code and models of our paper will be published on: https://github.com/gidariss/FewShotWithoutForgetting"), they therefore implemented the method on a computer, in which a processor and a computer-readable storage medium are inherent. Thus, Claim 11 is rejected for the reasons set forth in the rejection of Claim 1.
Regarding Claim 2, the rejection of Claim 1 is incorporated. The Gidaris/Liu/Kim combination teaches:
training the weight generator using a random number of the base classes and a fake task of fake classes selected from the classes; or training the weight generator using a fixed number of fake tasks of the fake classes selected from the base classes (Gidaris, p. 7, 3.3. Training procedure, 2nd training stage: "In order to train the few-show [sic] classification weight generator, in each batch we randomly pick Knovel 'fake' novel categories from the base categories and we treat them in the same way as we will treat the actual novel categories after training").
Regarding Claim 3, the rejection of Claim 2 is incorporated. The Gidaris/Liu/Kim combination teaches:
wherein training the weight generator comprises determining an average cross-entropy loss (Gidaris, p. 6, 3.3. Training procedure: "In order to learn the ConvNet-based recognition model ... and the few-shot classification weight generator ... we use as the sole input a training set ... of base categories. We split the training procedure into 2 stages and at each stage we minimize a different cross-entropy loss of the following form:
1
K
b
a
s
e
∑
b
=
1
K
b
a
s
e
1
N
b
∑
i
=
1
N
b
l
o
s
s
x
b
,
i
,
b
(9)
," where Gidaris's Eq. 9 determines an average loss of the number of examples
N
b
in each category) using ... sample images (Gidaris, p. 2, 1. Introduction, Few-shot classification-weight generator based on attention: "A typical ConvNet based recognition model, in order to classify an image, first extracts a high level feature representation from it.... [T]he first technical novelty of our work is that we enhance a typical object recognition system with an extra component, called few-shot classification weight generator that accepts as input a few training examples of a novel category") randomly selected ... from classes used to train the weight generator (Gidaris, p. 7, 3.3. Training procedure, 2nd training stage: "During this stage we train the ... the few-shot classification weight generator.... [I]n each batch we randomly pick
K
n
o
v
e
l
'fake' novel categories from the base categories and we treat them in the same way as we will treat the actual novel categories after training").
Regarding Claim 4, the rejection of Claim 1 is incorporated. The Gidaris/Liu/Kim combination teaches:
wherein the few-shot learning model comprises a feature extractor (Gidaris, p. 3, Figure 1, which depicts the model "Dynamic Few-Shot Learning without Forgetting" comprising component "Feature Extractor," and Fig. 1 caption: Overview of our system. "It consists of: (a) a ConvNet based recognition model (that includes a feature extractor and a classifier").
Regarding Claim 5, the rejection of Claim 1 is incorporated. The Gidaris/Liu/Kim combination teaches:
wherein the second task further comprises a second set of sample images that are classified into classes (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "We evaluate our few-shot object recognition system on the Mini-ImageNet dataset [26] that includes ... 600 images per category... For our experiments we used ... 20 categories for testing" and "Those few-shot tasks are formed by first sampling
K
n
o
v
e
l
categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories but do not overlap with the training examples," where Gidaris's second set of
K
n
o
v
e
l
sampled images from the second of multiple tasks corresponds to the instant second set, where
K
n
o
v
e
l
is less than the 20 categories used for testing, as in p. 8, Table 1, where
K
n
o
v
e
l
=
5
).
Regarding Claim 6, the rejection of Claim 5 is incorporated. The Gidaris/Liu/Kim combination teaches:
wherein generating the second classification weights comprises: extracting features (Gidaris, p. 3, 3. Methodology, ConvNet-based recognition model: "It consists of (a) a feature extractor ... that extracts a d-dimensional feature vector... from an input image") from the second set of sample images of the second task (Gidaris, p. 7, column 2: "Those few-shot tasks are formed by first sampling
K
n
o
v
e
l
categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories"); and
generating the second classification weights by the weight generator (Gidaris, p. 3, 3. Methodology: "if
W
n
o
v
e
l
... are the classification weight vectors of the novel categories inferred by the few-shot weight generator, then by setting
W
*
=
W
b
a
s
e
⋃
W
n
o
v
e
l
on the classifier ... we enable the ConvNet model to recognize both base and novel categories") using the extracted features, the base classification weights, and the first classification weights (Gidaris, p. 7, column 1-2: "The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories," where Gidaris learns base classification weights and first classification weights during the 1st training stage and the 2nd training stage, respectively).
Regarding Claim 8, the rejection of Claim 5 is incorporated. The Gidaris/Liu/Kim combination teaches:
wherein generating the second classification weights comprises: extracting features (Gidaris, p. 3, 3. Methodology, ConvNet-based recognition model: "It consists of (a) a feature extractor ... that extracts a d-dimensional feature vector... from an input image") from the second set of sample images of the second task (Gidaris, p. 7, column 2: "Those few-shot tasks are formed by first sampling
K
n
o
v
e
l
categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories"); and
generating the second classification weights by the weight generator (Gidaris, p. 3, 3. Methodology: "if
W
n
o
v
e
l
... are the classification weight vectors of the novel categories inferred by the few-shot weight generator, then by setting
W
*
=
W
b
a
s
e
⋃
W
n
o
v
e
l
on the classifier ... we enable the ConvNet model to recognize both base and novel categories") using the extracted features and classification weights of classes selected from the base classes and the first classes of the first task (Gidaris, p. 7, column 1-2: "The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories," where Gidaris learns base classification weights and first classification weights during the 1st training stage and the 2nd training stage, respectively).
Claims 12-16 and 18 recite a processor to perform precisely the methods of Claims 2-6 and 8, respectively, and are rejected for the reasons recited above.
Claims 7, 9, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gidaris et al., "Dynamic few-shot visual learning without forgetting" (hereinafter "Gidaris") in view of Liu, et al., "A Universal Representation Transformer Layer for Few-Shot Image Classification" (hereinafter "Liu") in view of Kim, et al. (US 2020/0242476 A1, hereinafter "Kim") in view of Castro, et al., "End-to-End Incremental Learning" (hereinafter "Castro").
Regarding Claim 7, the rejection of Claim 6 is incorporated. The Gidaris/Liu/Kim combination teaches:
receiving, by the processor, a third task (Gidaris, p. 7, column 1: "We evaluate our few-shot object recognition system on the Mini-ImageNet dataset.... For our experiments we used the splits by Ravi and Laroche [17] that include 64 categories for training, 16 categories for validation, and 20 categories for testing," where the instant third task corresponds to Gidaris's validation);
generating, by the weight generator, third classification weights for third classes of the third task (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories in order to form few-shot tasks on which the trained model is evaluated. Those few-shot tasks are formed by first sampling ... categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories but do not overlap with the training examples," where Gidaris's metalearning test categories corresponds to the instant generating third classification weights, as in p. 3, 3. Methodology, Few-shot classification weight generator: "for each novel category ..., the few-shot classification weight generator ... gets as input the feature vectors ... of its ... training examples, ... and the classification weight vectors of the base categories ... and generates a classification weight vector ... for that novel category," and where Gidaris's second evaluation task of
K
n
o
v
e
l
categories of 60 total test categories corresponds to the instant third task) based on the base classification weights, the first classification weights, ... (Gidaris, p. 3, 3. Methodology, Few-shot classification weight generator: "Therefore, if
W
n
o
v
e
l
... are the classification weight vectors of the novel categories inferred by the few-shot weight generator, then by setting
W
*
=
W
b
a
s
e
⋃
W
n
o
v
e
l
on the classifier
C
.
W
*
we enable the ConvNet model to recognize both base and novel categories," where Gidaris's
W
b
a
s
e
comprises base and first/'fake' classification weights learned during the 1st and 2nd training stages);
updating, by the processor, the few-shot learning model (Gidaris, p. 3, 3. Methodology: "if
W
n
o
v
e
l
... are the classification weight vectors of the novel categories inferred by the few-shot weight generator, then by setting
W
*
=
W
b
a
s
e
⋃
W
n
o
v
e
l
on the classifier ... we enable the ConvNet model to recognize both base and novel categories," where Gidaris's weight union operation corresponds to the instant updating) with the third classification weights (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories in order to form few-shot tasks on which the trained model is evaluated. Those few-shot tasks are formed by first sampling ... categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories but do not overlap with the training examples," where Gidaris's metalearning test categories corresponds to the instant updating with third classification weights) for sample image classification into the third classes (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "We evaluate our few-shot object recognition system on the Mini-ImageNet dataset ... with 600 images per category... For our experiments we used ... 20 categories for testing," where Gidaris's second
K
n
o
v
e
l
categories of 60 total test categories corresponds to the instant third classes); and
classifying, by the processor, a third set of sample images of the third task (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "We evaluate our few-shot object recognition system on the Mini-ImageNet dataset ... with 600 images per category... For our experiments we used ... 20 categories for testing," where Gidaris's evaluating a few-shot object recognition system corresponds to the instant classifying, where Gidaris's second
K
n
o
v
e
l
categories of 60 total test categories of images corresponds to the instant third task, and where Gidaris's sampled set of novel categories corresponds to the instant third set, as in: p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "Those few-shot tasks are formed by first sampling
K
n
o
v
e
l
categories and one or five training example per category) into the third classes using the few-shot learning model (Gidaris, p. 7, 4.1. Mini-ImageNet experiments, Evaluation setting for recognition of novel categories: "Those few-shot tasks are formed by first sampling
K
n
o
v
e
l
categories and one or five training example per category (1-shot and 5-shot settings respectively), which the trained model uses for metalearning those categories, and then evaluating it on some test examples that come from the same novel categories," where Gidaris's novel categories corresponds to the instant third classes).
Gidaris may not explicitly teach third classification weights for third classes of the third task based on the base classification weights, the first classification weights, and the second classification weights.
However, Castro teaches:
third classification weights (Castro, 4 Incremental Learning, Training process: "during training, all the weights of the model are updated. Thus, for any sample, features obtained from the feature extractor are likely to change between successive incremental steps, and the classification layers should adapt their weights to deal with these new features," where Castro's classification weights for an incremental step corresponds to the instant third classification weights) for third classes of the third task based on the base classification weights, the first classification weights, and the second classification weights (Castro, p. 10, Fig. 3: "Accuracy on CIFAR-100. Average and standard deviation of 5 executions with (a) 2 ... classes per incremental step," where Fig. 3(a) depicts classification accuracy on fifty 2-way tasks in succession, including task three, and p. 9, 6 Evaluation on CIFAR-100, Dataset: "CIFAR-100 dataset [15] is composed of 60k 32 × 32 RGB images of 100 classes, with 600 images per class. Every class has 500 images for training and 100 images for testing. We divide the 100 classes into splits of 2, 5, 10, 20, and 50 classes with a random order. Thus, we will have 50, 20, 10, 5, and 2 incremental training steps respectively. After each incremental step, the resulting model is evaluated on the test data composed of all the trained classes, i.e., old and new ones," where Castro's all the trained classes comprises the third classes of the third task).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gidaris regarding generating third classification weights, updating the model, and classifying third samples using the third weights with those of Castro regarding third classification weights based on base, first, and second classification weights.
The motivation to do so would be to facilitate supporting processing scenarios where data representing new classes as well as previously learned classes must be handled (Castro, p. 1, 1 Introduction: "Traditional models require all the samples (corresponding to the old and the new classes) to be available at training time, and are not equipped to consider only the new data, with a small selection of the old data. In an ideal system, the new classes should be integrated into the existing model, sharing the previously learned parameters. ... The main contribution of this paper is addressing this challenge with our end-to-end approach designed specifically for incremental learning. ... The model is learned by minimizing the cross-distilled loss, a combination of two loss functions: cross-entropy to learn the new classes and distillation to retain the previous knowledge corresponding to the old classes").
Claim 17 recites a processor to perform precisely the method of Claim 7 and is rejected for the reasons recited above.
Regarding Claim 9, the rejection of Claim 8 is incorporated. The Gidaris/Liu/Kim combination teaches:
Gidaris may not explicitly teach a random number of the classes is selected for the classification weights that are used to generate the second classification weights.
However, Castro teaches a random number of the classes is selected for the classification weights that are used to generate the second classification weights (Castro, p. 9, 6 Evaluation on CIFAR-100, Dataset: "CIFAR-100 dataset [15] is composed of 60k 32 × 32 RGB images of 100 classes, with 600 images per class. Every class has 500 images for training and 100 images for testing. We divide the 100 classes into splits of 2, 5, 10, 20, and 50 classes with a random order. ... After each incremental step, the resulting model is evaluated on the test data composed of all the trained classes, i.e., old and new ones," where Castro's random order of classes corresponds to the instant a random number of the classes, where Castro's randomly selected ordinal for classes is depicted as the x-axis of Fig. 3(a), including that for the second class and second weights).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gidaris regarding generating the second classification weights using classification weights of classes selected from the base classes and the first classes of the first task with those of Castro regarding a random number of the classes being selected for the classification weights that are used to generate the second classification weights. The motivation to do so would be to facilitate evaluating the accuracy of the incremental learning methodology against alternative methodologies using standard evaluation protocols (Castro, p. 1, Abstract: "We evaluate our method extensively on the CIFAR-100 and ImageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance" and p. 9, 6 Evaluation on CIFAR-100: "We perform three types of experiments on the CIFAR-100 dataset. In the first one (Sec. 6.1), we set the maximum storage capacity of our representative memory unit, following the experimental protocol in [23]" and p. 9, 6 Evaluation on CIFAR-100, Dataset: "Our evaluation metric at each incremental step is the standard multi-class accuracy. We execute the experiments five times with different random class orders, reporting the average accuracy and standard deviation").
Claim 19 recites a processor to perform precisely the method of Claim 9 and is rejected for the reasons recited above.
Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Gidaris et al., "Dynamic few-shot visual learning without forgetting" (hereinafter "Gidaris") in view of Liu, et al., "A Universal Representation Transformer Layer for Few-Shot Image Classification" (hereinafter "Liu") in view of Kim, et al. (US 2020/0242476 A1, hereinafter "Kim") in view of Ke, et al., "Compare Learning: Bi-Attention Network for Few-Shot Learning" (hereinafter "Ke").
Regarding Claim 10, the rejection of Claim 1 is incorporated. The Gidaris/Liu/Kim combination teaches:
wherein the weight generator is a bi-attention weight generator (Gidaris, p. 6, 3.2. Fewshot classification weight generator, Attention-based weight inference: "We enhance the above feature averaging mechanism with an attention based mechanism that composes novel classification weight vectors by 'looking' at a memory that contains the base classification weight vectors.... [A]n extra attention-based classification weight vector
w
a
t
t
'
is computed as:
w
a
t
t
'
=
1
N
'
∑
i
=
1
N
'
∑
b
=
1
K
b
a
s
e
A
t
t
ϕ
q
z
-
i
'
,
k
b
⋅
w
-
b
(7)
where ...
A
t
t
.
,
.
is an attention kernel implemented as a cosine similarity function.... The final classification weight vector is computed as a weighted sum of the average based classification vector
w
'
a
v
g
and the attention based classification vector
w
a
t
t
'
:
w
'
=
ϕ
a
v
g
⊙
w
'
a
v
g
+
ϕ
a
t
t
⊙
w
'
a
t
t
(8)
where
⊙
is the Hadamard product, and
ϕ
a
v
g
,
ϕ
a
t
t
∈
R
d
are learnable weight vectors," where Gidaris's
w
'
of Eq. 8 corresponds to the instant weight generator based on
N
'
novel training examples, and where the attention kernel
A
t
t
.
,
.
over novel features and base class keys of Eq. 7 corresponds to the instant bi-attention), and generating the second classification weights comprises:
applying a first linear transformation weight for a query of a bi-attention module to extracted features from the first set of sample images (Gidaris, p. 6, 3.2. Fewshot classification weight generator, Attention-based weight inference: "an extra attention-based classification weight vector
w
a
t
t
'
is computed as: [Eq. 7] where
ϕ
q
∈
R
d
×
d
is a learnable weight matrix that transforms the feature vector
z
-
i
'
to query vector used for querying the memory," where Gidaris's learnable weight matrix
ϕ
q
, feature vector
z
-
i
'
, and query vector
ϕ
q
z
-
i
'
correspond to the instant linear transformation weight, extracted features, and query, respectively);
applying ... a key of the bi-attention module to the base classification weights and the first classification weights (Gidaris, p. 6, column 1: "an extra attention-based classification weight vector
w
a
t
t
'
is computed as: [Eq. 7] where ...
k
b
∈
R
d
b
K
b
a
s
e
is a set of
K
b
a
s
e
learnable keys (one per base category) used for indexing the memory and
A
t
t
.
,
.
is an attention kernel," where Gidaris's learnable key
k
b
and normalized classification weight
w
¯
b
correspond to the instant key and base and first classification weights, respectively); and
applying a third linear transformation weight for a value of the bi-attention module ... (Gidaris, p. 6, column 2: "The final classification weight vector is computed as a weighted sum of the average based classification vector
w
'
a
v
g
and the attention based classification vector
w
a
t
t
'
: [Eq. 8] where
⊙
is the Hadamard product, and
ϕ
a
v
g
,
ϕ
a
t
t
∈
R
d
are learnable weight vectors," where Gidaris's weight vector
ϕ
a
t
t
corresponds to the instant third transformation weight).
Gidaris may not explicitly teach applying a second linear transformation weight for a key of the bi-attention module and applying a third linear transformation weight for a value of the bi-attention module.
However, Ke teaches:
applying a second linear transformation weight for a key of the bi-attention module ... ; and applying a third linear transformation weight for a value of the bi-attention module (Ke, p. 2235, 2.3. Bi-attention: "In our approach, we apply a Multi- Head Bi-Attention structure to enhance the compare ability of model.
head
i
=
BiAttn
Q
W
i
Q
,
C
W
i
C
(4)
H
=
Concat
head
1
,
head
2
,
…
,
head
h
W
O
(5)
Where linearly mapping parameter matrices:
W
i
Q
∈
R
d
h
×
d
q
,
W
i
C
∈
R
d
h
×
d
c
,
W
O
∈
R
d
d
c
×
d
h
.
h
is the number of head," where Ke's
W
i
C
and
W
O
correspond to the instant second and third linear transformation weights, respectively).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gidaris regarding applying a key of the bi-attention module to the base classification weights and the first classification weights and applying a third linear transformation weight for a value of the bi-attention module with those of Ke regarding applying a second linear transformation weight for a key of the bi-attention module and applying a third linear transformation weight for a value of the bi-attention module. The motivation to do so would be to support fine-grained matching of unlabeled input against labeled samples (Ke, p. 2233, 1. Introduction: "Unlike self-attention, we reform it into bi-attention to calculate the relationship score between the unlabeled query and a small number of labeled samples. Compared to RelationNets [8] we benefit from an element-wise and position-independent attention strategy, thus all elements from the two embeddings will be compared to each other entirely with the finest granularity").
Claim 20 recites a processor to perform precisely the method of Claim 10 and is rejected for the reasons recited above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT N DAY whose telephone number is (703)756-1519. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/R.N.D./Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122