DETAILED ACTION
This action is responsive to the Claims filed on 1/26/2026. Claims 1, 4-13, 15-19 are pending in the case. Claims 1, 13 and 19 are independent claims.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant's arguments filed 1/26/2026 have been fully considered but they are not persuasive.
Response to Arguments
Applicant's arguments/amendments filed 1/26/2026, with respect to the 112 rejections, have been fully considered. These rejections have been withdrawn accordingly.
The remaining Applicant's arguments filed 1/26/2026 have been fully considered but they are not persuasive.
With respect to the 101 rejections:
Applicant argues the claims reflect an improvement citing sections of the specification.
Examiner notes the specification may very well describe an improvement to technology, importantly these improvements should be reflected in additional elements, as further elaborated below such additional elements are not present in the claims
Applicant argues that, as supported by the August 4, 2025 memo, claim limitations that encompass AI in a way that cannot be practically performed in the human mind do not fall with the abstract idea grouping. Applicant cites the amended claim and compares it to Example 39 which recites a training limitation.
Examiner highlights that the claim does not recite training. Training of a neural network generally encompasses AI in a way that cannot be performed in the mind. As such would be evaluated under Prong 2A and 2B. However, "extracting…features by inputting…data to a first neural network module" is not analogous to training. A "neural network module" does not invoke any particular neural network technology, but rather a label for a generalized "module". Extracting features using such a module broadly includes feature extraction according to a rule, for example extracting all features with values according to a constraint (i.e extracting whole numbers). Such extraction can indeed be practically performed in the mind. To avoid this interpretation, Applicant should further specify how the extraction is performed an what technologically confined functions (i.e non-abstract steps) the particular technology performs such that the limitation is an additional element providing an improvement.
The rejection of claim 13 is maintained for the same reasons
With respect to claim 19 updating a parameter is a decision made in the mind as such the limitation recites a judicial exception. Limiting the updating to be "of a neural network model" merely describes the association of the abstract data to a particular field of use.
With respect to the art rejections:
Applicant argues with respect to claim 1: "the predicted probability which is then normalized" is extrapolated from the reference.
Examiner disagrees. The immediately preceding sentence of the paragraph describes a "predicted probability" the following sentence reads "a softmax layer is then used to normalize this probability". It is quite clear from the reference that "this probability" refers to the "predicted probability" introduced in the prior sentence. The interpretation relied upon for the rejection is not understood to be an extrapolation.
Applicant notes, without any providing additional reasoning, that the reference Liu says nothing about 1) "calculating a task execution result," or 2) "the synthesis feature" or 3) "slot probability as a weight," let alone in the claimed combination 4) "calculating a task execution result from the synthesis features and the extracted concept by applying the slot probability as a weight."
Examiner notes that 1) the reference describes a system for classification via calculation by multiple neural network modules. The resulting classification is a calculation of a task execution result. 2) As noted in the rejection the system synthesizes centroid features and concatenated features which can both be considered synthesized features as claimed. 3) the "predicted probability" is "weighted" by the softmax layer which is used to generate the classification results. This probability is used to influence or weigh the output classification as it is part of the training process used to generate the classifications. This predicted probability is understood to be a slot probability as claimed. 4) as noted above the final classification by the system is a result of applying and weighing the plurality of features claimed including the nominal slot probability, the synthesis feature and the extracted concept, as such the output classification, i.e task result, is a product of the synthesis features, extracted concept and applied slot probability
Finally, Applicant noted claims 13 and 19 are not supported for the same reasons. Examiner disagrees for the same reasons presented.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 4-13, 15-19 are rejected under 35 U.S.C. 101 because the claims are directed to an abstract idea without significantly more.
Regarding Claim 1
Under step 1, the claim is directed to a series of steps in method for few-shot learning, which is directed to a process, one of the statutory categories.
Under Step 2A Prong 1, the claim recites the following limitations which are considered mental evaluations and/or mathematical calculations:
estimating a task embedding corresponding to a task to be executed from support data that is a first amount of learning data;
calculating a slot probability of a concept memory necessary for a task based on the task embedding;
extracting features of query data that is test data, and of the support data;
comparing local features for the extracted features with slots of a concept memory to extract a concept,
and generating synthesis features to have maximum similarity to the extracted features through the slots of the concept memory;
and calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability as a weight.
wherein the estimating the task embedding corresponding to the task to be executed from the support data includes:
extracting digitized vector-type task features from the support data;
and estimating the task embedding based on context information of the extracted task features;
and wherein the extracting the digitized vector-type task features from the support data includes
extracting task features… by inputting the support data to a first neural network module;
and extracting task features including context information …by inputting the extracted task features to a second neural network module
Each of these limitations only generally describe different kinds of mental evaluations and calculations. Generation, estimation and calculation of features amounts to selection performed in the mind, while calculation based on claimed features are additionally abstract evaluations which are additionally considered mathematical calculations.
Therefore, the claims recited an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claims recite the additional element(s) the limitations “executed by a computer”, “by inputting the support data to a first neural network module… by inputting the extracted task features to a second neural network module”) amounts to mere instructions to apply a computer technology to an abstract idea. For clarity, Examiner notes that a “neural network module” does not necessarily describe “computer technology”. As such inputting data into a module may very well describe the performance of the abstract idea and not be considered an additional element. Nevertheless, if one were to assert that the claimed neural network module cannot be considered part of the abstract idea, it is clear that such generic technology to perform the abstract idea amounts to using generic computer technology to perform the recited abstract idea, see MPEP 2106.05(f).
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 4
The claim is dependent upon claim 1. The claim recites the limitations: “estimating the task embedding by connecting the task features including the context information” which further describe the abstract ideas recited in the parent claims, under Step 2A Prong 1, in particular the limitations describe mental evaluations.
The claim recites the following additional element(s), in addition to those already identified in the parent claim: (“inputting the task features including the context information to a third neural network module.”) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f).
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 5
The claim is dependent upon claim 4.
The claim recites the following additional element(s), in addition to those already identified in the parent claim “wherein the first to third neural network modules are learned based on a second amount, larger than the first amount, of already prepared base data” which is generally linking the use of the judicial exception to a particular technological environment or field of use, see MPEP 2106.05(h).
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 6
The claim is dependent upon claim 1.
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the independent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 7-8
These claims are dependent upon claim 6.
Each of the limitations described in these claims, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the parent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the parent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 9-11
These claims are dependent upon claim 1
Each of the limitations described in these claims, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the parent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the independent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 12
These claims are dependent upon claim 11.
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the parent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 13
Under step 1, the claim is directed to A concept based few-shot learning apparatus, which is directed to a machine, one of the statutory categories.
Under Step 2A Prong 1, the claim recites the following limitations which are considered mental evaluations and/or mathematical calculations:
a concept memory for storing a concept feature extracted through learning from base data;
a task estimation unit for extracting digitized task features from support data, which is a first amount of learning data, and for estimating task embedding based on context information of extracted tasks;
a concept attention focusing unit for calculating a slot probability of a concept memory necessary for a task based on the task embedding;
a feature extraction unit for extracting features of query data that is test data, and of the support data;
a concept extraction and synthesis feature generation unit for comparing a local feature for the extracted features with slots of a concept memory to extract a concept, and for generating a synthesis feature having maximum similarity with the extracted features;
and a task execution unit for calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability as a weight.
wherein the task extraction unit extracts task features by… extracts task features including context information by… and estimates the task embedding by connecting the task features including the context information
Therefore, the claims recited an abstract idea.
inputting the support data to a first neural network module,
inputting the extracted task features to a second neural network module
inputting the task features including the context information to a third neural network module.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claims recite the additional element(s) the limitations “comprising a computer configured to execute program code to implement”, “inputting the support data to a first neural network module, …inputting the extracted task features to a second neural network module…inputting the task features including the context information to a third neural network module”) amounts to mere instructions to apply a computer technology to an abstract idea. For clarity, Examiner notes that a “neural network module” does not necessarily describe “computer technology”. As such inputting data into a module may very well describe the performance of the abstract idea and not be considered an additional element. Nevertheless, if one were to assert that the claimed neural network module cannot be considered part of the abstract idea, it is clear that such generic technology to perform the abstract idea amounts to using generic computer technology to perform the recited abstract idea, see MPEP 2106.05(f).
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 15
The claim is dependent upon claim 13.
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the independent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 16
These claims are dependent upon claim 15.
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the independent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 17
These claims are dependent upon claim 16.
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the independent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 18
These claims are dependent upon claim 13.
Each of the limitations described in the claim, under Step 2A Prong 1, only serve to describe the abstract ideas addressed in the independent claim, in particular the limitations describe mental evaluations.
Furthermore, under step 2A Prong 2 and 2B, the claim(s) do not recite additional elements to consider other than those considered in the independent claim.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Regarding Claim 19
Under step 1, the claim is directed to A learning method for concept based few-shot learning executed by a computer, which is directed to a process, one of the statutory categories.
Under Step 2A Prong 1, the claim recites the following limitations which are considered mental evaluations and/or mathematical calculations:
batch-sampling a task from base data, and generating an episode constructed with support data and query data in each sampled task;
extracting features for the generated episode;
generating a synthesis feature and a concept for the extracted features;
calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability of the concept memory as a weight;
calculating a task loss based on a difference between a correct answer and the task execution result, and calculating a synthesis loss based on a distance between the extracted features and the synthesis feature;
and updating a model parameter of a neural network model such that a total loss obtained by adding the synthesis loss to the task loss is minimized.
Each of these limitations only generally describe different kinds of mental evaluations and calculations. Generation and estimation of features amounts to selection performed in the mind, while calculation based on claimed features are additionally abstract evaluations which are additionally considered mathematical calculations. Further, Examiner notes that the “concept memory” is not understood to describe any specific hardware or computer function but is actually a mental abstraction for encoding/storing information. Paragraph 0034 describes stores conceptual features in digitized vectors. A digitized vector is a vector of digits (1’s and 0’s) which encode the concept stored.
Therefore, the claims recited an abstract idea.
Furthermore, under step 2A Prong 2 and 2B, the claim does not recite additional elements to consider.
The recited additional elements when considered alone or in combination neither integrates the abstract idea into a practical application nor provides significantly more than the abstract idea itself.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 4-7, 9-13, 15-16 and 18-19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Liu “Learn from Concepts: Towards the Purified Memory for Few-shot Learning”
Regarding Claim 1
Liu teaches, A concept based few-shot learning method executed by a computer, the concept based few-shot learning method comprising: (Section 2 pg 2 “This paper aims to address the problem of few-shot classification… and the goal is to learn the concepts” )
estimating a task embedding corresponding to a task to be executed from support data that is a first amount of learning data; (pg 2 “We first extract the features of support and query samples as task-relevant embeddings V t” and figure 1
PNG
media_image1.png
310
630
media_image1.png
Greyscale
examiner notes that features extracted from the encoder based on the input are also understood to be estimated task embeddings.)
calculating a slot probability of a concept memory necessary for a task based on the task embedding; (pg 2 section 2.2 “In the context of FSL, the episodic sampling makes the feature extractor rapidly learn new concept with very few samples” Section 2.4 pg 4 “When the optimization is complicated, the predicted probability of a node vi belonging to Ck can be denoted as… A softmax layer is then used to normalize this probability” the calculated probability of each node based on the associated task embeddings and concept is the slot probability.)
extracting features of query data that is test data, and of the support data; (pg 2 “We first extract the features of support and query samples”)
comparing local features for the extracted features with slots of a concept memory to extract a concept, ( pg 3 “In particular, we use a graph augmentation module (GAM) to capture the relationship between a specific task context and relevant concepts…Their similarities are then propagated through a graph neural network” pg 4 “Thus the meta-knowledge is augmented to existing inference task and allow the model to adapt to new task by taking advantage of learned concept” similarities between the specific tasks and relevant tasks is a comparison performed in the GAM shown in figure 1. The comparison is based on learned concepts, i.e slots of a concept, and specific task embeddings, i.e local extracted features)
and generating synthesis features to have maximum similarity to the extracted features through the slots of the concept memory; ( pg 2 and 3 “To alleviate the above issues, we propose to refine the memory via learning an optimal prototype for each category…To progressively purify semantic information from labels, we firstly conduct category-wise averaging to f l sup to obtain the centroids f_cen , each of which is then concatenated with the prototype f_p…In practice, the following constraint is enforced to purify the discriminative information and further refine the memory
PNG
media_image2.png
38
369
media_image2.png
Greyscale
” pg 4 Section 2.4 “During the meta-training stage, our model is optimized by minimizing the binary cross-entropy loss…Finally, the total loss L can be defined as…
PNG
media_image3.png
24
179
media_image3.png
Greyscale
” the loss terms are minimized, the aggregated extracted features f_cat are compared to the synthesis features f_p in the KL divergence. The model is trained such that the KL divergence, a measure of dissimilarity, is minimized, i.e the dissimilarity is minimized. Alternatively stated, the similarity is maximized.)
and calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability as a weight. (pg 4 Section 2.4 “When the optimization is complicated, the predicted probability of a node vi belonging to Ck can be denoted as…A softmax layer is then used to normalize this probability” pg 4 Section 3.1 “For evaluation, all the results are obtained under standard few-shot classification protocol: 5-way 1-shot and 5-shot task.” the predicted probability which is then normalized applies the predicted probability as a weight in determining the predicted classification) wherein the estimating the task embedding corresponding to the task to be executed from the support data includes…extracting digitized vector-type task features from the support data; (pg 2 “We first extract the features of support and query samples as task-relevant embeddings V t”) and estimating the task embedding based on context information of the extracted task features. ( pg 2 “To progressively purify semantic information from labels, we firstly conduct category-wise averaging to f_sup to obtain the centroids f_cen, each of which is then concatenated with the prototype…Here we propose to use the information bottleneck principle to purify the concept” pg 3 “In the view of above, we propose to refine the memory bank by momentum update” as also shown in figure 1 the embeddings are purified based on the context information provided by the mutual information and centroids of extracted task features.) extracting task features by inputting the support data to a first neural network module; and extracting task features including context information by inputting the extracted task features to a second neural network module. (abstract “On its basis, a Graph Augmentation Module (GAM) is introduced to aggregate these concepts and knowledge learned from new tasks via a graph neural network” pg 3 “Here, θ and φ denote the parameters of the encoder and the FC layer” figure 1, the encoders are a first neural network module inputting support data, whose output is provided to the 2nd module as annotated.
PNG
media_image4.png
423
701
media_image4.png
Greyscale
Regarding Claim 4
Liu teaches claim 3
Liu teaches, estimating the task embedding by connecting the task features including the context information and inputting the task features including the context information to a third neural network module. (figure 1, the context information and task features are combined in the KNN/GAM part of the model which is considered the third neural network module, as annotated.
PNG
media_image5.png
423
770
media_image5.png
Greyscale
)
Regarding Claim 5
Liu teaches claim 4
Liu teaches, wherein the first to third neural network modules are learned based on a second amount, larger than the first amount, of already prepared base data. ( pg 4 “In the pre-training stage, the baseline… is trained from scratch with a batch size of 128 by minimizing the standard cross entropy loss on base classes. After that, we randomly select 40 episodes per iteration for training the ConvNet in the meta-train stage… We train 50,000 epochs in total, and the encoder are frozen for the first 25000 iterations” the modules are trained based on 128 item batch size, and in the meta train stage shown in the figure. The complete data set is larger than the first amount used for a single estimation as claimed in claim 1. )
Regarding Claim 6
Liu teaches claim 1
Liu teaches, calculating a slot probability of the concept memory necessary for that task by applying an attention focusing technique to the concept memory and the task embedding. (pg 3-4 “In order to perform the aggregation, we use an attention coefficient calculated by the centroid f_cen and selected embeddings m...
PNG
media_image6.png
54
341
media_image6.png
Greyscale
…
PNG
media_image7.png
42
343
media_image7.png
Greyscale
…
PNG
media_image8.png
54
376
media_image8.png
Greyscale
…
PNG
media_image9.png
51
259
media_image9.png
Greyscale
” the probablity P for the task is computed (8) as a result applied attention computation in prior equation (4), i.e attention focussing, to the task embedding and concept memory.)
Regarding Claim 7
Liu teaches claim 6
Liu teaches, calculating a slot probability of a concept memory necessary for that task by applying a cosine similarity function and a softmax function after applying each of matrices learned from base data to a slot of the concept memory and the task embedding (pg 3 “The refining of M is, in essence, iteratively aggregating discriminative information and diluting task-irrelevant nuisances” “For each class centroid f _cen l-th episode, we first compute the cosine similarities between f_cen and each prototype in the memory M …
PNG
media_image10.png
77
314
media_image10.png
Greyscale
” as noted previously the slot probability depends on equation (4). The equation applies cosine similarity after learning/refined the matrix memory M. The function in equation 4 is understood by PHOSITA to be a softmax function, whose formal definition is this equation:
PNG
media_image11.png
60
180
media_image11.png
Greyscale
)
Regarding Claim 9
Liu teaches claim 1
Liu teaches, calculating a prototype for an 1-th category of the support data as an average of the concept of the support data; ( pg 2 Section 2.2 “To progressively purify semantic information from labels, we firstly conduct category-wise averaging to
f
s
u
p
l
to obtain the centroids” the centroid is the prototype of the lth category of the support data denoted by _sup) and calculating a task execution result in which a distance between the prototype and query data is minimized by applying the slot probability as a weight to a difference between the concept of the query data and the calculated prototype. (pg 3 “In practice, the following constraint is enforced to purify the discriminative information and further refine the memory… represents the KL-divergence, y denotes the label… Note that both p(y|f l cat) and p(y|f l p ) denote conditional distribution…” pg 4 “When the optimization is complicated, the predicted probability… A softmax layer is then used to normalize this probability…During the meta-training stage, our model is optimized by minimizing the binary cross-entropy loss… Finally, the total loss L can be defined as
PNG
media_image12.png
29
180
media_image12.png
Greyscale
” the total loss function is minimized which includes a distance between prototype and query via the KL divergence. The slot probability is part of the consideration in minimizing the total loss function, thus applying a slot probability as a weight to a difference.)
Regarding Claim 10
Liu teaches claim 1
Liu teaches, calculating a prototype for an l-th category of the support data as an average of the synthesis feature of the support data; ( pg 2 Section 2.2 “To progressively purify semantic information from labels, we firstly conduct category-wise averaging to
f
s
u
p
l
to obtain the centroids” the centroid is the prototype of the lth category of the support data denoted by _sup) and calculating a task execution result in which a distance between the prototype and query data is minimized by applying the slot probability as a weight to a difference between the synthesis feature of the query data and the calculated prototype (pg 3 “In practice, the following constraint is enforced to purify the discriminative information and further refine the memory… represents the KL-divergence, y denotes the label… Note that both p(y|f l cat) and p(y|f l p ) denote conditional distribution…” pg 4 “When the optimization is complicated, the predicted probability… A softmax layer is then used to normalize this probability…During the meta-training stage, our model is optimized by minimizing the binary cross-entropy loss… Finally, the total loss L can be defined as
PNG
media_image12.png
29
180
media_image12.png
Greyscale
” the total loss function is minimized which includes a distance between prototype and query via the KL divergence. The slot probability is part of the consideration in minimizing the total loss function, thus applying a slot probability as a weight to a difference.)
Regarding Claim 11
Liu teaches claim 1
Liu teaches, batch-sampling tasks from base data, generating an episode constructed with support data and query data in each task, (pg 2 “In particular, for a N-way K-shot task, a support set … and a query set …are sampled…In the meta-test, a test task is also sampled with the same sized episode from unseen categories C novel. The aim is to classify T unlabeled samples in query set into N classes correctly” pg 4 “In the pre-training stage, the baseline… is trained from scratch with a batch size of 128”) and learning a model parameter by applying few- shot learning to the generated episode. (pg 2 “This paper aims to address the problem of few-shot classification… In this framework, the samples in meta-training and meta-testing are not samples but episodes” pg 3 “In practice, the following constraint is enforced to purify the discriminative information and further refine the memory…
PNG
media_image13.png
32
357
media_image13.png
Greyscale
…Here, θ and φ denote the parameters of the encoder and the FC layer” pg 4 “is trained from scratch with a batch size of 128 by minimizing the standard crossentropy loss on base classes” learning according to a loss function is learning the model parameters that optimize the loss function)
Regarding Claim 12
Liu teaches claim 11
Liu teaches, wherein the learning the model parameter includes: (Figure 1 pg 3 depicts the flow of data during both pre-training and meta training which is the learning of model parameters.) extracting features for the generated episode (pg 3 “For a N-way K-Shot task, given the features extracted from the encoder” as shown in the figure the encoder extracts from the support and query set in the episode) generating a synthesis feature and a concept for the extracted features; (pg 2 “To progressively purify semantic information from labels, we firstly conduct category-wise averaging to f_sup to obtain the centroids f_cen, each of which is then concatenated with the prototype” the information purification describe in Section 2.2 is the features and concept synthesis.) calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability of the concept memory as a weight (pg 4 Section 2.4 “When the optimization is complicated, the predicted probability of a node vi belonging to Ck can be denoted as…A softmax layer is then used to normalize this probability” pg 4 Section 3.1 “For evaluation, all the results are obtained under standard few-shot classification protocol: 5-way 1-shot and 5-shot task.” the predicted probability which is then normalized applies the predicted probability as a weight in determining the predicted classification) calculating a task loss based on a difference between a correct answer and the task execution result ( pg 4 “we also introduce another binary cross-entropy loss (BCE) Lm to estimate the discrepancy between the ground-truth and the predictions of meta-knowledge nodes edge-label”) and calculating a synthesis loss based on a distance between the extracted features and the synthesis feature ( pg 3 In practice, the following constraint is enforced to purify the discriminative information and further refine the memory…
PNG
media_image14.png
32
338
media_image14.png
Greyscale
…represents the KL-divergence, y denotes the label. Note that both p(y|f l cat) and p(y|f l p ) denote conditional distribution,” the KL divergence between the distributions represents the distance between extracted and synthetic features. This loss is the synthesis loss) and updating a model parameter such that a total loss obtained by adding the synthesis loss to the task loss is minimized. ( pg 4 “our model is optimized by minimizing the binary cross-entropy loss (BCE)…Finally, the total loss L can be defined as…
PNG
media_image15.png
31
166
media_image15.png
Greyscale
” losses are added together, one of ordinary skill would understand that loss minimization involves iterative updates of associated model parameters.)
Regarding Claim 13
Liu teaches, A concept based few-shot learning apparatus comprising a computer configured to execute program code to implement (Section 2 pg 2 “This paper aims to address the problem of few-shot classification… and the goal is to learn the concepts” pg 4 “We train 50,000 epochs in total, and the encoder are frozen for the first 25000 iterations the paper describes a neural network system requiring a computer to process large amounts of data) a concept memory for storing a concept feature extracted through learning from base data; (pg 1 “In this paper, we propose a purified memory framework to tackle these two problems. Our basic idea is simply that simulated the recognition process of human beings. To keep stable and consistent concepts, we hold a memory bank during episodic training”) a task estimation unit for extracting digitized task features from support data, which is a first amount of learning data, (pg 2 “We first extract the features of support and query samples as task-relevant embeddings V t” and figure 1
PNG
media_image1.png
310
630
media_image1.png
Greyscale
.)
and for estimating task embedding based on context information of extracted tasks; ( pg 2 “We first extract the features of support and query samples as task-relevant embeddings V t…Further, the purified memory is incorporated with a graph augmentation module for robust prediction (introduced in Sec. 2.3). In this module, we mine the relevant prototypes V m” as shown in the figure the task embeddings are based on the query and encoder mean, thus based on context information.) a concept attention focusing unit for calculating a slot probability of a concept memory necessary for a task based on the task embedding; (pg 3-4 “In order to perform the aggregation, we use an attention coefficient calculated by the centroid f_cen and selected embeddings m...
PNG
media_image6.png
54
341
media_image6.png
Greyscale
…
PNG
media_image7.png
42
343
media_image7.png
Greyscale
…
PNG
media_image8.png
54
376
media_image8.png
Greyscale
…
PNG
media_image9.png
51
259
media_image9.png
Greyscale
” the probablity P for the task is computed (8) as a result applied attention computation in prior equation (4), i.e attention focussing, to the task embedding and concept memory.)a feature extraction unit for extracting features of query data that is test data, and of the support data (pg 2 “We first extract the features of support and query samples”) a concept extraction and synthesis feature generation unit for comparing a local feature for the extracted features with slots of a concept memory to extract a concept, ( pg 3 “In particular, we use a graph augmentation module (GAM) to capture the relationship between a specific task context and relevant concepts…Their similarities are then propagated through a graph neural network” pg 4 “Thus the meta-knowledge is augmented to existing inference task and allow the model to adapt to new task by taking advantage of learned concept” similarities between the specific tasks and relevant tasks is a comparison performed in the GAM shown in figure 1. The comparison is based on learned concepts, i.e slots of a concept, and specific task embeddings, i.e local extracted features) and for generating a synthesis feature having maximum similarity with the extracted features; ( pg 2 and 3 “To alleviate the above issues, we propose to refine the memory via learning an optimal prototype for each category…To progressively purify semantic information from labels, we firstly conduct category-wise averaging to f l sup to obtain the centroids f_cen , each of which is then concatenated with the prototype f_p…In practice, the following constraint is enforced to purify the discriminative information and further refine the memory
PNG
media_image2.png
38
369
media_image2.png
Greyscale
” pg 4 Section 2.4 “During the meta-training stage, our model is optimized by minimizing the binary cross-entropy loss…Finally, the total loss L can be defined as…
PNG
media_image3.png
24
179
media_image3.png
Greyscale
” the loss terms are minimized, the aggregated extracted features f_cat are compared to the synthesis features f_p in the KL divergence. The model is trained such that the KL divergence, a measure of dissimilarity, is minimized, i.e the dissimilarity is minimized. Alternatively stated, the similarity is maximized.) and a task execution unit for calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability as a weight. (pg 4 Section 2.4 “When the optimization is complicated, the predicted probability of a node vi belonging to Ck can be denoted as…A softmax layer is then used to normalize this probability” pg 4 Section 3.1 “For evaluation, all the results are obtained under standard few-shot classification protocol: 5-way 1-shot and 5-shot task.” the predicted probability which is then normalized applies the predicted probability as a weight in determining the predicted classification) wherein the task extraction unit extracts task features by inputting the support data to a first neural network module, extracts task features including context information by inputting the extracted task features to a second neural network module and estimates the task embedding by connecting the task features including the context information and inputting the task features including the context information to a third neural network module. (figure 1, the encoders, corresponding to the first neural network module extracts task features from input support data. The 2nd module annotated below extracts task features which include context information by passing through a 2nd stage classifier for extracts momentum features, i.e context information. The KNN embeds these two types of features and connects them into a new set of features to be input into the third module annotated below.
PNG
media_image16.png
423
770
media_image16.png
Greyscale
)
Regarding Claim 15
Claim 15 is rejected for the reasons set forth in the rejection of claim 6 in connection with claim 13
Regarding Claim 16
Claim 16 is rejected for the reasons set forth in the rejection of claim 7 in connection with claim 13
Regarding Claim 18
Claim 18 is rejected for the reasons set forth in the rejection of claim 10 in connection with claim 13
Regarding Claim 19
Liu teaches, A learning method for concept based few-shot learning executed by a computer, the concept based few-shot learning method comprising: (Section 2 pg 2 “This paper aims to address the problem of few-shot classification… and the goal is to learn the concepts”) batch-sampling a task from base data, and generating an episode constructed with support data and query data in each sampled task (pg 2 “In particular, for a N-way K-shot task, a support set … and a query set …are sampled…In the meta-test, a test task is also sampled with the same sized episode from unseen categories C novel. The aim is to classify T unlabeled samples in query set into N classes correctly” pg 4 “In the pre-training stage, the baseline… is trained from scratch with a batch size of 128”) extracting features for the generated episode (see figure 1) generating a synthesis feature and a concept for the extracted features; and calculating a synthesis loss based on a distance between the extracted features and the synthesis feature; ( pg 2 and 3 “To alleviate the above issues, we propose to refine the memory via learning an optimal prototype for each category…To progressively purify semantic information from labels, we firstly conduct category-wise averaging to f l sup to obtain the centroids f_cen , each of which is then concatenated with the prototype f_p…In practice, the following constraint is enforced to purify the discriminative information and further refine the memory
PNG
media_image2.png
38
369
media_image2.png
Greyscale
” pg 4 Section 2.4 “During the meta-training stage, our model is optimized by minimizing the binary cross-entropy loss…Finally, the total loss L can be defined as…
PNG
media_image3.png
24
179
media_image3.png
Greyscale
” the loss terms are minimized, the aggregated extracted features f_cat are compared to the synthesis features f_p in the KL divergence. The model is trained such that the KL divergence, a measure of dissimilarity, is minimized, i.e the dissimilarity is minimized. Alternatively stated, the similarity is maximized.) calculating a task execution result from the synthesis feature and the extracted concept by applying the slot probability of the concept memory as a weight; (pg 4 Section 2.4 “When the optimization is complicated, the predicted probability of a node vi belonging to Ck can be denoted as…A softmax layer is then used to normalize this probability” pg 4 Section 3.1 “For evaluation, all the results are obtained under standard few-shot classification protocol: 5-way 1-shot and 5-shot task.” the predicted probability which is then normalized applies the predicted probability as a weight in determining the predicted classification) calculating a task loss based on a difference between a correct answer and the task execution result, ( pg 4 “we also introduce another binary cross-entropy loss (BCE) Lm to estimate the discrepancy between the ground-truth and the predictions of meta-knowledge nodes edge-label”) and updating a model parameter of a neural network module such that a total loss obtained by adding the synthesis loss to the task loss is minimized. ( pg 4 “our model is optimized by minimizing the binary cross-entropy loss (BCE)…Finally, the total loss L can be defined as…
PNG
media_image15.png
31
166
media_image15.png
Greyscale
” losses are added together, one of ordinary skill would understand that loss minimization involves iterative updates of associated model parameters. The optimization of the “model” amounts to updating parameters of a neural network module as claimed.)
Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA 35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 8 and 17 are rejected under 35 U.S.C. § 103 as being unpatentable over Liu “Learn from Concepts: Towards the Purified Memory for Few-shot Learning” further in view of Karunaratne US PG PUB US20220180167A1
Regarding Claim 8
Liu teaches claim 6
Liu teaches, calculating a similarity between the slot of the concept memory and the task embedding based on a cosine similarity function… and calculating slot probability by applying the same weight to a slot of concept memory (pg 3 “The refining of M is, in essence, iteratively aggregating discriminative information and diluting task-irrelevant nuisances” “For each class centroid f _cen l-th episode, we first compute the cosine similarities between f_cen and each prototype in the memory M …
PNG
media_image10.png
77
314
media_image10.png
Greyscale
” as noted previously the slot probability depends on equation (4). The equation applies cosine similarity after learning/refined the matrix memory M. The function in equation 4 is understood by PHOSITA to be a softmax function, whose formal definition is this equation:
PNG
media_image11.png
60
180
media_image11.png
Greyscale
)
Liu does not explicitly teach, comparing the similarity with a preset threshold,
[calculating a slot probability] whose similarity exceeds the threshold as a result of the comparison.
Karunaratne however when addressing a sharpening function for the attention vectors in few-shot learning teaches, comparing the similarity with a preset threshold,
[calculating a slot probability] whose similarity exceeds the threshold as a result of the comparison. (paragraph 0052-0054 “A similarity score of the query information element with the support set may be determined in step 305… where α(x, y)=1 means x and are perfectly similar or correlated, α(x,y)=0 means they are perfectly orthogonal or uncorrelated, and α(x,y)=−1…The set of similarity scores may be transformed in step 307 using a sharpening function ε. From the point of view of attention, two nearly dissimilar (i.e., uncorrelated) hypervectors should lead to a focus close to 0. Therefore, the sharpening function E may satisfy the following condition: ε(α(x,y)):≈0 when α(x,y)≈0. (Eq1). The equation Eq1 may ensure that there is no focus between a query hypervector and a dissimilar support hypervector. The sharpening function E may also satisfy the following inequalities
PNG
media_image17.png
119
294
media_image17.png
Greyscale
… Equation Eq2 implies non-negative weights in the attention vectors” paragraph 0056 “where p=w·V is the output probability distribution which is the weighted sum of one-hot labels” equation 2 forces nonnegative weights, thus the function compares the similarity such that the distance between the vectors greater than negative, i.e calculating slot probability when greater than 0.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Liu with the sharpening function of Karunaratne for classifying a query element based on similarity to a set of support elements. One would have been motivated to make such a combination because as noted by Karunaratne “The sharpening function may be advantageous for the following reasons… the error function may need to be differentiable… The sharpening function may also advantageously be used to learn the directions of the hypervectors… The sharpening function may be a correlation-preserving sharpening function” (Karunaratne paragraph 0021)
Regarding Claim 17
Claim 17 is rejected for the reasons set forth in the rejection of claim 8 in connection with claim 13
Conclusion
Prior art not relied upon:
Basu et al “Semi-Supervised Few-Shot Intent Classification and Slot Filling” describes few-shot learning using both intents and slot prototypes to inform log likelihood loss as described in equation (9) and (10) of the instant application
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.R.G./
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122