Last updated: May 29, 2026
Application No. 17/938,650
TRANSFER LEARNING SYSTEM AND METHOD FOR DEEP NEURAL NETWORK

Final Rejection §101§102§103
Filed
Oct 06, 2022
Priority
Feb 15, 2022 — RE 10-2022-0019646
Examiner
BAKER, EZRA JAMES
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
OA Round
2 (Final)
This examiner grants 50% of cases after interview

— +53.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 16 resolved cases, 2023–2026
Examiner Intelligence

BAKER, EZRA JAMES View full profile →
Grants 50% of resolved cases
Career Allowance Rate
8 granted / 16 resolved
-5.0% vs TC avg
Strong +53% interview lift
Without
With
+53.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
23 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
5.5%
-34.5% vs TC avg
§103
90.8%
+50.8% vs TC avg
§102
3.7%
-36.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	The present application is being examined under the claims filed 01/16/2026.
	Claims 1, 3-12, and 14-19 are pending.

Response to Amendment
This Office Action is in response to Applicant’s communication filed 01/16/2026 in response to office action mailed 10/16/2026. The Applicant’s remarks and any amendments to the claims or specification have been considered with the results that follow.
Response to Arguments
Regarding Objections and 35 U.S.C. 112(b)
In Remarks page 8, Argument 1
(Examiner summarizes Applicant’s arguments) Applicant argues that the claims have been amended and the rejections under 35 U.S.C. 112(b) and claim objections should be withdrawn.
Examiner’s response to Argument 1
	Examiner agrees that the 112(b) rejections and claim objections have been overcome by Applicant’s amendments and they are thus withdrawn. However, new objections are issued for claim 1 as necessitated by amendment.

Regarding 35 U.S.C. 101
In Remarks page 8, Argument 2
(Examiner summarizes Applicant’s arguments) Applicant argues that Examiner’s rejection under 35 U.S.C. 101 amount to mere speculation that the claims are directed to abstract ideas, citing to the office action.
Examiner’s response to Argument 2
	Examiner disagrees. Examiner’s rejections do not merely speculate about a rejection, but instead provide ample details and explanation for why the claims are directed to abstract ideas without integrating into a practical application nor amounting to significantly more. Moreover, applicant’s citation to the office action appears to be erroneous as the quotations do not correspond to any actual portions of the office action.

In Remarks page 9, Argument 3
(Examiner summarizes Applicant’s argument) Applicant argues that, per the August 2025 memorandum, operations on multi-dimensional matrices and vectors in neural networks exceed human capacity i.e. a human mind cannot perform the limitation of derive a feature…” in the claim as amended. Applicant further argues that it is technically impossible for a human to perform matrix multiplications and feature extraction by passing images through a neural network.
Examiner’s response to Argument 3
Applicant’s interpretation of the claim limitations is overly narrow. Examiner must consider the broadest reasonable interpretation of the claims and avoid unnecessarily limiting a claim limitation by importing limitations from the specification. The limitation does not recite the details of performing matrix multiplication nor passing data through a deep neural network. Rather, the claim recites:
wherein the pre-trained model selecting unit derives a feature of the transfer learning data from an output of a first part or a second part of the plurality of stored pre-trained models and selects a pre-trained model based on clustering for the transfer leaning data using the feature of the transfer learning data.
Deriving a feature from a model output could be performed by, for example, observing an output from a model displayed on a screen and evaluating the observed output to describe the features visually seen (e.g. “there is a dog and a bird in this picture”). Selecting a pre-trained model amounts to evaluating clustered data and the features to pick the best model from a list of options. These operations are not specific computer operations, but instead broadly recited limitations that definitively recite abstract ideas.
	Moreover, examiner notes that even if the matrix multiplications and vectors were explicitly recited in the claims, these are mathematical concepts, another grouping of abstract idea (see MPEP 2106.04(a)(2) I.).

In Remarks page 9-10, Argument 4
(Examiner summarizes Applicant’s arguments) Applicant argues that the claims do not merely recite the result of selecting a model, but specific technological means of achieving the result. Applicant argues that the system as claimed recites clustering on features from internal model layers, optimizing based on computational and storage constraints, and using math for specific technical artifacts (feature vectors extracted from neural network layers to drive a control signal). Applicant further argues that the invention solves the problem that “it is not easy to determine domain similarity” and selection requires “expertise in the deep neural network model, and requires a lot of time and effort.”
Examiner’s response to Argument 4
	Examiner disagrees. The claimed invention merely recites the neural network details generically without details of how the neural network operates to provide any benefits to the invention. For example, the amended portion of the claim recites “wherein the pre-trained model selecting unit derives a feature of the transfer learning data from an output of a first part or a second part of the plurality of stored pre-trained model”, only mentioning that features are derived from an output without any details about how the output is derived by the neural network. This broadly recited portion of the limitation amounts to performing the limitation in a computer environment. MPEP 2106.04(a)(2) III C. recites “Claims can recite a mental process even if they are claimed as being performed on a computer. […] For instance, examiners should review the specification to determine if the claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. In these situations, the claim is considered to recite a mental process.” Therefore, since the claim limitation does no more than recite a mental process performed in a computer environment, it still recites a mental process.
	Moreover, regarding claim 7, examiner maintains that the claim is directed to the abstract idea of a mathematical process. Consider claim 2 of Example 47 from the 2024 subject matter eligibility examples. Claim 2 recites the limitation (page 4)
(c) training, by the computer, the ANN based on the input data and a selected training algorithm to generate a trained ANN, wherein the selected training algorithm includes a backpropagation algorithm and a gradient descent algorithm;
Though the claim limitation uses the math for an artificial neural network, this limitation was still considered as a mathematical concept because the broadest reasonable interpretation of the limitation encompasses performing math:
(page 7) “Step (c) requires specific mathematical calculations (a backpropagation algorithm and a gradient descent algorithm) to perform the training of the ANN and therefore encompasses mathematical concepts.”
Therefore, the explicitly recited mathematical formulas in claim 7 are similarly directed to mathematical concepts.

Regarding 35 U.S.C. 102/103
In Remarks page 10-11, Argument 5
Among other features, claim 1 requires a pre-trained model selecting unit that, "derives a feature of the transfer learning data from an output of a first part or a second part of the plurality of stored pre-trained models, performs clustering on the transfer learning data using the derived feature ... and selects the pre-trained model based on a result of the clustering."
Vu fails to disclose this technical mechanism. Vu relies on Task Embeddings (metadata/labels). As shown in Figure 1 of Vue, Vu computes a "task embedding" for a target task and compares it to source task embeddings using cosine similarity. That is, Vu does not feed the transfer learning data through the candidate pre-trained models to extract features for clustering.
The claimed invention empirically tests the actual transfer learning data by passing it through the internal layers ("first part or second part") of the stored models to see if the model's learned features can separate (cluster) the new data classes. Vu, on the other hand, semantically matches a "Target Task" label to a "Source Task" label. Vu does not evaluate the internal layer outputs of the candidate models against the transfer learning data during the selection phase.
Because Vu lacks the specific step of deriving features from the outputs of the stored models and performing clustering on those features to drive selection, Vu does not anticipate claims 1 and 12.
Examiner’s response to Argument 5
	Examiner disagrees. Applicant omits the details of the operation of Vu and interprets the claim limitations overly narrowly. While Vu indeed performs transfer learning by computing embeddings and comparing with target and source task embeddings with cosine similarity, Applicant ignores the details of how Vu accomplishes this result, which is substantially the same as what is claimed in the independent claims.
The broadest reasonable interpretation of “a feature of the transfer learning data” can include any data or descriptor related to the transfer learning data. Vu teaches forming an embedding (a feature of the data), by (Vu page 6 col 1) “feed[ing] the entire training dataset into the model”. Additionally, the Fisher matrix is derived from actual training label (Vu page 6 col 1)  “In our experiments, we compute the empirical Fisher, which uses the training labels instead of sampling from Pθ(x; y)”. Thus the empirical Fisher is calculated using the log likelihood that the model will predict a given output yi given an input xi. See the equation on Vu page 6 column 2. Thus, the embedding taught by Vu is unequivocally a feature of the transfer learning derived from an output of a first part or a second part of the plurality of models as claimed because it is derived directly from passing the transfer learning data into the model.
Furthermore, the broadest reasonable interpretation of “clustering” includes mapping data such that points closer together are more likely to be related than points far apart. Vu’s embedding is a form of clustering as illustrated by figure 3. Datasets that are more related have embeddings that are clustered closer together and datasets that are less related are further apart after performing the embedding operation. This clustering is exactly why embedding similarity is a good metric for task similarity. Vu explicitly describes that the task embedding has this property:
(page 8 column 2) “TASKEMB captures domain information to some extent (Figure 3, bottom), but it also encodes task similarity: for example, POS-PTB is closer to POS-EWT, another part-of-speech tagging task that uses a different data source.”
Therefore, Vu selects a target task based on a similarity between embeddings of source tasks and target tasks which is a selection based on clustering for the transfer learning data using the feature of the transfer learning data. 

In Remarks page 11, Argument 6
(Examiner summarizes Applicant’s arguments) Applicant argues that claims depending from 1 and 12 are allowable for the same reasons, and Lu does not cure the deficiencies of Vu.
Examiner’s response to Argument 6
	Examiner disagrees because the 35 U.S.C. 102 rejection to the independent claims are maintained.

In Remarks page 11, Argument 7
Claim 17 is patentably distinct over Vu in View of Mustafa. Claim 17 recites a specific iterative algorithm for configuring an ensemble. Specifically, the claim requires: "sequentially adding a transfer learning model that contributes the most to the improvement of the accuracy of the pre-configured ensemble to the ensemble until a preset accuracy is satisfied." The Office Action relies on Mustafa to teach these limitations (previously recited in dependent claim 20). Specifically, regarding the limitation of adding models "until a preset accuracy is satisfied," the Examiner cites Mustafa page 4, Section 2.3.1, stating: "we keep all pre-trained models within some threshold percentage T¾ of the top kNN accuracy." The passage cited by the Examiner describes Mustafa' s method for selecting the initial pool of candidate models to be considered. Mustafa filters out weak models that are not within a certain percentage of the best model before the ensemble construction begins. This is a static filter applied to the candidate list, not a dynamic termination condition for the ensemble generation loop. 
Claim 17 requires the ensemble construction loop to continue adding models "until a preset accuracy is satisfied." In contrast, Mustafa' s actual greedy ensemble algorithm ( described in Section 2.3.2) adds models to minimize validation cross-entropy. Mustafa does not teach terminating this loop based on satisfying a specific preset accuracy target. Typically, such greedy algorithms run for a fixed number of iterations or until the validation error plateaus/increases. Therefore, Mustafa fails to teach the specific algorithmic control logic of determining when to stop adding models based on a satisfied accuracy target. The Office Action's proposed combination improperly conflates Mustafa' s static pre-processing filter with the claimed dynamic feedback loop, and thus fails to render the specific algorithmic steps of claim 17 obvious.
Claim 17 is allowable. Claims 18 and 19 depend from claim 17 and are allowable at least for this reason.
Examiner’s response to Argument 7
	Examiner disagrees. Applicant’s interpretation of claim 17 is overly narrow and selectively focuses on a portion of the limitation without regard for the limitation as a whole. The claim limitation reads: 
sequentially adding a transfer learning model that contributes the most to the improvement of the accuracy of the pre-configured ensemble to the ensemble until a preset accuracy is satisfied
When read in context, it is clear that the broadest reasonable interpretation of the limitation includes adding any particular transfer learning in a sequence which contributes the most improvement to the ensemble. Furthermore, the broadest reasonable interpretation of “until a preset accuracy is satisfied” should not be narrowly construed to refer only to a termination condition of an iterative loop (in fact iterations, loops, termination conditions, and adding any more than individual transfer learning models are not mentioned in the claim). The broadest reasonable interpretation of the limitation includes adding a model to an ensemble from a sequence which improves the accuracy and satisfies any preset accuracy condition.
	Turning to prior art reference Mustafa, Mustafa teaches:
(page 4 first paragraph) “We use the greedy algorithm introduced by Caruana et al. (2004). At each step, we greedily pick the next model which minimises cross-entropy on the validation set when it is ensembled with already chosen models.”
(page 4 section 2.3.1) “We aim to dynamically set this balance per-dataset using a heuristic based on the kNN accuracies; namely, we keep all pre-trained models within some threshold percentage τ% of the top kNN accuracy, up to a maximum of K = 15. Ideally, this would adaptively discard experts poorly suited to a given task, whose inclusion would likely harm ensemble performance.”
Thus Mustafa teaches adding a model from a sequence which satisfies a preset accuracy threshold, which accomplishes the same result as the claimed invention of adding the best models from a sequence and obtaining an accuracy within a desired accuracy tolerance.
	Examiner notes that even if the claims were further narrowed to include an iteration loop and a termination condition to reflect what Applicant suggests, further search revealed new art that would teach this narrower form of the claim. Consider Hackett (PGPUB no. US20170344903A1) paragraphs 32-33 and figure 6:
(paragraph 32-33) “The error threshold may be predetermined based on the desired classification accuracy for the ensemble 104. In response to a positive determination at block 614, the training of the ensemble 104 may be determined to be complete. That is, the ensemble 104 may be determined as having suitable accuracy to be used to classify a production dataset 112 to generate production data classification results 114. On the other hand, in response to a negative determination at block 614 indicating that the classification error is not within the error threshold, the method 600 may proceed to block 616 where the ensemble build engine may determine whether it is possible to add new ensemble members to the ensemble 104.”

    PNG
    media_image1.png
    499
    379
    media_image1.png
    Greyscale

	Therefore, the rejections of claim 17 and claims depending from claim 17 are maintained.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
	A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) limitations:
a pre-trained model storage unit configured to store a plurality of pre-trained models (claim 1)
[0028] Moreover, each of terms such as "...unit", "...apparatus" and "module" described in specification denotes an element for performing at least one function or operation, and may be implemented in hardware, software or the combination of hardware and software. 
[0041]       In an embodiment, the pre-trained model storage unit 110 may pre- learn a pre-trained model generated by training multiple deep neural networks with one dataset. For example, the pre-trained model storage unit 110 may learn different types of models, such as ResNet or DenseNet, for the same data set. Alternatively, the pre-trained model storage unit 110may learn a plurality of ResNets each having 10, 20, or 30 layers for the same data. 
[0042]       As another embodiment, a plurality of pre-trained models may be generated by applying several different datasets of the same or heterogeneous domains to one deep neural network. 
[0043]       The pre-trained model storage unit 110 may store characteristics of the pre-trained model together. For example, the pre-trained model storage unit 110 may store a size of the pre-trained model (a file size or the number of parameters) or the amount of calculation (FLOPs, Multi-Adds) of the pre-trained model. In addition, the characteristics of the pre-training data may be stored together. In the case of an image-based classification problem, whether it is to classify the type of car or the type of flower may be specified. Alternatively, whether it is to classify a general image such as ImageNet (ILSVRC2012) data in the pre-trained model storage unit 110 may be specified.

a transfer learning data input unit configured to receive transfer learning data (claim 1)
[0028] Moreover, each of terms such as "...unit", "...apparatus" and "module" described in specification denotes an element for performing at least one function or operation, and may be implemented in hardware, software or the combination of hardware and software.

[0045]       The transfer learning data input unit 120 receives transfer learning 
data and transmits the transfer learning data to the pre-trained model selecting unit 140 and the transfer learning unit 150. The transfer learning data input unit 120 preferably includes a pre-processing function for the transfer learning data. For example, when the sizes of images of the pre-training data and the transfer learning data are different from each other, the transfer learning data input unit 120 may change the size of the image of the transfer learning data to the same image size as that of the pre-training data.

a pre-trained model selecting unit configured to select a pre-trained model (claim 1) wherein the pre-trained model selecting unit derives feature of the transfer learning data (claim 1, claim 4) the pre-trained model selecting unit performs performance evaluation (claim 5) the pre-trained model selecting unit calculates the performance evaluation score by purity (claim 6) the pre-trained model selecting unit calculates the performance evaluation score based on normalized mutual information (NMI) (claim 7)
[0028] Moreover, each of terms such as "...unit", "...apparatus" and "module" described in specification denotes an element for performing at least one function or operation, and may be implemented in hardware, software or the combination of hardware and software.

[0054] As previously described with reference to FIG. 2, the CNN 10 includes the first part 11 including convolutions and pooling operations and the second part 12 including a fully connected layer or a dense layer. Referring to FIG. 3, when transfer learning data is input through the transfer learning data input unit 120 (S210), the pre-trained model selecting unit 140 may regard an output of the first part or the second part of the plurality of stored pre-trained models as a feature of the transfer learning data or derive the feature of the transfer learning data using the output of the first part 11 or the second part 12. At this time, when the pre-trained model selecting unit 140 uses the user requirement, the output of the first or second part of the pre-trained model(s) corresponding to the user requirement is used as a feature of the transfer learning data (S220). 
[0055] As described above, the first or second part of the pre-trained model using the deep neural network refines by stages to extract the main information of the input data, so the output of the first or the second part of the pre-trained model may be used as a feature of the transfer learning data. 
[0056] In another embodiment, the pre-trained model selecting unit 140 may use the output of an intermediate layer between the input layer and a last layer of the first part as a feature of the transfer learning data, but preferably use an output of the last layer of the first part as a feature of the transfer learning data. 
[0057] In another embodiment, the pre-trained model selecting unit 140 may use the output of the intermediate layer between the input layer and the last layer of the second part as a feature of the transfer learning data, but preferably use an output of the last layer of the second part as a feature of the transfer learning data. 
[0058] After generating the output of the first part or the second part of the pre-trained model as a feature of the transfer learning data (S220), the pre-trained model selecting unit 140 performs clustering on the transfer learning data using the feature of the transfer learning data (S230). As examples of algorithms that may be used as a clustering module in the pre-trained model selecting unit 140, algorithms such as K-means, Fuzzy K-Means, K-Medoids, and hierarchical clustering, density- based clustering (DBScan), hierarchical density-based clustering (HDBScan), etc. may be used, and the clustering algorithm that may be used by the pre-trained model selecting unit 140 of the transfer learning system 100 for a deep neural network according to an embodiment of the present invention is not limited to the aforementioned clustering algorithm. 

a transfer learning unit configured to generate one or more transfer learning models (claim 1)

[0028] Moreover, each of terms such as "...unit", "...apparatus" and "module" described in specification denotes an element for performing at least one function or operation, and may be implemented in hardware, software or the combination of hardware and software.

[0089] The transfer learning unit 150 generates a plurality of transfer learning models by performing transfer learning using the pre-trained model selected by the pre-trained model selecting unit 140 and the transfer learning data input from the transfer learning data input unit 120. The transfer learning unit 150 generates P transfer learning models by performing transfer [earning using the P selected pre- trained models and the transfer learning data. The transfer learning of the transfer learning unit 150 uses a known transfer learning method and is not limited to a specific learning method. 

a user requirement input unit for inputting a user requirement (claim 3)
[0046] The user requirement input unit 130 receives a user requirement. In an embodiment, the user requirement input unit 130may input a user requirement received through an input device such as a keyboard, mouse, or touch screen provided in a computer or mobile terminal used by the user. 
[0047] Here, the user requirement may be a requirement for the pre-trained model, such as the size of the pre-trained model (the file size or the number of parameters) or the amount of calculation (FLOPs, Multi-Adds) of the pre-trained model.For example, when the user prefers a small deep neural networkmodel, the user requirement input unit 130 may input an upper limit of the number of parameters of the pre-trained model. Alternatively, the user requirement input unit 130may input an upper limit of the amount of calculation (FLOPs, Multi-Adds) of the pre-trained model. 

transfer learning model output unit configured to calculate classification performance accuracy (claim 8) the transfer learning model output unit generates and outputs a final transfer learning model (claim 9) wherein the transfer learning model output unit first selects a first transfer learning model having the highest classification performance accuracy, and selects a second transfer learning model having the greatest accuracy improvement (claim 10) wherein the transfer learning model output unit configures the ensemble (claim 11)

[0028] Moreover, each of terms such as "...unit", "...apparatus" and "module" described in specification denotes an element for performing at least one function or operation, and may be implemented in hardware, software or the combination of hardware and software.

[0090] In an embodiment, it is preferable to use some of the transfer learning data as validation data. The transfer learning unit 150 evaluates the classification perfornance of the P transfer learning models by using the verification data. Classification performance evaluation may be calculated by Top-1 accuracy or area under the receiver operating characteristic curve (ROC-AUC) score. 
[0091] The transfer learning model output unit 160may calculate classification performance accuracy for the plurality of transfer learning models generated by the transfer learning unit 150, and select and output one or more transfer learning models in the order, starting from a transfer learning mode having the highest classification performance accuracy, among the plurality of transfer learning models generated by the transfer learning unit 150. In an embodiment, the transfer learning model output unit 160may select T (TSP) transfer learning models from among the P transfer learning models generated by the transfer learning unit 150. In this case, the transfer learning models may be selected based on classification perfornance calculated by the transfer learning unit 150 as a selection criterion. 
[0092] When two or more transfer learning models are selected, the transfer learning model output unit 160may configure a final transfer learning model in an ensemble form. In an embodiment, the transfer learning model output unit 160 may generate and output a final transfer learning model y(x) by configuring an ensemble by Equation 13 below with respect to one or more selected transfer learning models

a pre-trained model storing step of storing a plurality of pre-trained models (claim 12)
[0028] Moreover, each of terms such as "...unit", "...apparatus" and "module" described in specification denotes an element for performing at least one function or operation, and may be implemented in hardware, software or the combination of hardware and software. 
[0041]       In an embodiment, the pre-trained model storage unit 110 may pre- learn a pre-trained model generated by training multiple deep neural networks with one dataset. For example, the pre-trained model storage unit 110 may learn different types of models, such as ResNet or DenseNet, for the same data set. Alternatively, the pre-trained model storage unit 110may learn a plurality of ResNets each having 10, 20, or 30 layers for the same data. 
[0042]       As another embodiment, a plurality of pre-trained models may be generated by applying several different datasets of the same or heterogeneous domains to one deep neural network. 
[0043]       The pre-trained model storage unit 110 may store characteristics of the pre-trained model together. For example, the pre-trained model storage unit 110 may store a size of the pre-trained model (a file size or the number of parameters) or the amount of calculation (FLOPs, Multi-Adds) of the pre-trained model. In addition, the characteristics of the pre-training data may be stored together. In the case of an image-based classification problem, whether it is to classify the type of car or the type of flower may be specified. Alternatively, whether it is to classify a general image such as ImageNet (ILSVRC2012) data in the pre-trained model storage unit 110 may be specified.

transfer learning data input step of inputting transfer learning data (claim 12)
[0045]       The transfer learning data input unit 120 receives transfer learning data and transmits the transfer learning data to the pre-trained model selecting unit 140 and the transfer learning unit 150. The transfer learning data input unit 120 preferably includes a pre-processing function for the transfer learning data. For example, when the sizes of images of the pre-training data and the transfer learning data are different from each other, the transfer learning data input unit 120 may change the size of the image of the transfer learning data to the same image size as that of the pre-training data.

a pre-trained model selecting step of selecting a pre-trained model (claim 12, 17) wherein the pre-trained model selecting step includes: generating an output […] and selecting a pre-trained model (claim 12, 15) wherein the pre-trained model selecting step includes selecting a pre-trained model (claim 14)
[0054] As previously described with reference to FIG. 2, the CNN 10 includes the first part 11 including convolutions and pooling operations and the second part 12 including a fully connected layer or a dense layer. Referring to FIG. 3, when transfer learning data is input through the transfer learning data input unit 120 (S210), the pre-trained model selecting unit 140 may regard an output of the first part or the second part of the plurality of stored pre-trained models as a feature of the transfer learning data or derive the feature of the transfer learning data using the output of the first part 11 or the second part 12. At this time, when the pre-trained model selecting unit 140 uses the user requirement, the output of the first or second part of the pre-trained model(s) corresponding to the user requirement is used as a feature of the transfer learning data (S220). 
[0055] As described above, the first or second part of the pre-trained model using the deep neural network refines by stages to extract the main information of the input data, so the output of the first or the second part of the pre-trained model may be used as a feature of the transfer learning data. 
[0056] In another embodiment, the pre-trained model selecting unit 140 may use the output of an intermediate layer between the input layer and a last layer of the first part as a feature of the transfer learning data, but preferably use an output of the last layer of the first part as a feature of the transfer learning data. 
[0057] In another embodiment, the pre-trained model selecting unit 140 may use the output of the intermediate layer between the input layer and the last layer of the second part as a feature of the transfer learning data, but preferably use an output of the last layer of the second part as a feature of the transfer learning data. 
[0058] After generating the output of the first part or the second part of the pre-trained model as a feature of the transfer learning data (S220), the pre-trained model selecting unit 140 performs clustering on the transfer learning data using the feature of the transfer learning data (S230). As examples of algorithms that may be used as a clustering module in the pre-trained model selecting unit 140, algorithms such as K-means, Fuzzy K-Means, K-Medoids, and hierarchical clustering, density- based clustering (DBScan), hierarchical density-based clustering (HDBScan), etc. may be used, and the clustering algorithm that may be used by the pre-trained model selecting unit 140 of the transfer learning system 100 for a deep neural network according to an embodiment of the present invention is not limited to the aforementioned clustering algorithm. 

a transfer learning step of generating a plurality of transfer learning models (claim 12, 17)
[0089] The transfer learning unit 150 generates a plurality of transfer learning models by performing transfer learning using the pre-trained model selected by the pre-trained model selecting unit 140 and the transfer learning data input from the transfer learning data input unit 120. The transfer learning unit 150 generates P transfer learning models by performing transfer [earning using the P selected pre- trained models and the transfer learning data. The transfer learning of the transfer learning unit 150 uses a known transfer learning method and is not limited to a specific learning method. 

a user requirement input step of inputting a user requirement (claim 14)
[0046] The user requirement input unit 130 receives a user requirement. In an embodiment, the user requirement input unit 130may input a user requirement received through an input device such as a keyboard, mouse, or touch screen provided in a computer or mobile terminal used by the user. 
[0047] Here, the user requirement may be a requirement for the pre-trained model, such as the size of the pre-trained model (the file size or the number of parameters) or the amount of calculation (FLOPs, Multi-Adds) of the pre-trained model.For example, when the user prefers a small deep neural networkmodel, the user requirement input unit 130 may input an upper limit of the number of parameters of the pre-trained model. Alternatively, the user requirement input unit 130may input an upper limit of the amount of calculation (FLOPs, Multi-Adds) of the pre-trained model. 

a transfer learning model output step of calculating classification performance accuracy […] and selecting and outputting one or more transfer learning models (claim 16)

[0090] In an embodiment, it is preferable to use some of the transfer learning data as validation data. The transfer learning unit 150 evaluates the classification perfornance of the P transfer learning models by using the verification data. Classification performance evaluation may be calculated by Top-1 accuracy or area under the receiver operating characteristic curve (ROC-AUC) score. 
[0091] The transfer learning model output unit 160may calculate classification performance accuracy for the plurality of transfer learning models generated by the transfer learning unit 150, and select and output one or more transfer learning models in the order, starting from a transfer learning mode having the highest classification performance accuracy, among the plurality of transfer learning models generated by the transfer learning unit 150. In an embodiment, the transfer learning model output unit 160may select T (TSP) transfer learning models from among the P transfer learning models generated by the transfer learning unit 150. In this case, the transfer learning models may be selected based on classification perfornance calculated by the transfer learning unit 150 as a selection criterion. 
[0092] When two or more transfer learning models are selected, the transfer learning model output unit 160may configure a final transfer learning model in an ensemble form. In an embodiment, the transfer learning model output unit 160 may generate and output a final transfer learning model y(x) by configuring an ensemble by Equation 13 below with respect to one or more selected transfer learning models

a step of configuring an ensemble by selecting at least some of the plurality of transfer learning models (claim 17) the step of configuring the ensemble includes: first selecting a first transfer learning model having a highest classification performance accuracy is, and selecting a second transfer learning model having the greatest accuracy improvement (claim 18) the step of configuring the ensemble includes configuring the ensemble by adding a transfer learning mode (claim 19) wherein the step of configuring the ensemble includes configuring an ensemble by adding a transfer learning model having the highest accuracy […] and a transfer learning model that contributes the most to improvement of accuracy (claim 12)
[0092] When two or more transfer learning models are selected, the transfer learning model output unit 160may configure a final transfer learning model in an ensemble form. In an embodiment, the transfer learning model output unit 160 may generate and output a final transfer learning model y(x) by configuring an ensemble by Equation 13 below with respect to one or more selected transfer learning models.
[0096]       In another embodiment, the transfer learning model output unit 160 
may sequentially configure an ensemble using the following method.  In an embodiment, the transfer learning model output unit 160 may first select a first transfer learning model having the highest classification performance accuracy and select a second transfer learning model having the greatest accuracy improvement when configuring an ensemble with the first transfer learning model, to configure an ensemble.


Claim Objections
Regarding Claim 1
Claim 1 is objected to because of the following informalities: “clustering for the transfer leaning data using the feature of the transfer learning data” should read “clustering for the transfer learning data using the feature of the transfer learning data”.  Appropriate correction is required.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



	Claims 1, 3-12, and 14-19 are rejected under 35 U.S.C. 101 for containing an abstract idea without significantly more.

Regarding Claim 1:
	Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
	Yes, the claim is to a machine.
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites the abstract ideas of:
a pre-trained model selecting unit configured to select a pre-trained model corresponding to the transfer learning data from among the plurality of stored pre-trained models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
and a transfer learning unit configured to generate one or more transfer learning models by performing transfer learning using the selected pre-trained model and the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to providing an opinion of how a machine learning model transfers to new data and a judgement of appropriate model parameters.
wherein the pre-trained model selecting unit derives a feature of the transfer learning data from an output of a first part or a second part of the plurality of stored pre-trained models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating given data or outputs to determine a result. The limitation could be performed in the human mind by, for example, performing a series of matrix multiplications in the human mind or using pen and paper.
and selects a pre-trained model based on clustering for the transfer leaning data using the feature of the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
	Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
A transfer learning system for a deep neural network, the transfer learning system comprising: a pre-trained model storage unit configured to store a plurality of pre-trained models that are deep neural network models learned using one or more pre-training datasets — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
a transfer learning data input unit configured to receive transfer learning data — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).

	Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
	No, the claim does not recite additional elements which amount to significantly more than the abstract idea itself. The additional elements as identified in step 2A prong 2:
A transfer learning system for a deep neural network, the transfer learning system comprising: a pre-trained model storage unit configured to store a plurality of pre-trained models that are deep neural network models learned using one or more pre-training datasets — This limitation is recited at a high level of generality and amounts to mere data gathering of storing and retrieving information in memory, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.
a transfer learning data input unit configured to receive transfer learning data — This limitation is recited at a high level of generality and amounts to mere data gathering of storing and retrieving information in memory, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.


Regarding Claim 2
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim merely recites the additional abstract idea
Step 2A Prong 1:
wherein the pre-trained model selecting unit derives a feature of the transfer learning data from an output of a first part or a second part of the plurality of stored pre-trained models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating given data or outputs to determine a result. The limitation could be performed in the human mind by, for example, performing a series of matrix multiplications in the human mind or using pen and paper.
a pre-trained model selecting unit configured to select a pre-trained model corresponding to the transfer learning data from among the plurality of stored pre-trained models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 4
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 3 which included an abstract idea (see rejection for claim 3). The claim merely recites the additional abstract ideas:
Step 2A Prong 1:
wherein the pre-trained model selecting unit derives feature of the transfer learning data based on an output of a first part or a second part of the pre-trained model corresponding to the user requirement — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating given data or outputs to determine a result. The limitation could be performed in the human mind by, for example, performing a series of matrix multiplications in the human mind or using pen and paper.
and selects a pre- trained model based on clustering for the transfer learning data using the feature of the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.


Regarding Claim 5
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 2 which included an abstract idea (see rejection for claim 2). The claim recites the additional limitations:
Step 2A Prong 1:
wherein the pre-trained model selecting unit performs performance evaluation on a result of performing clustering — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating machine learning models by separating data into groups according to rules and determining performance on known data.
and selects the pre-trained model in the order of the highest performance evaluation score — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on a given criteria.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 6
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 5 which included an abstract idea (see rejection for claim 5). The claim merely recites the additional abstract ideas:
Step 2A Prong 1:
wherein the pre-trained model selecting unit calculates the performance evaluation score by purity derived through the following equation for the result of performing the clustering: [Equation]

    PNG
    media_image2.png
    80
    287
    media_image2.png
    Greyscale

wherein N is a total number of data, Nkj is the number of j-th classes in a k-th cluster — This limitation is directed to the abstract idea of a mathematical process, and mathematical formulas or equations in particular (MPEP 2106.04(a)(2) I. B.). The claim explicitly recites a mathematical formula.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.


Regarding Claim 7
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 5 which included an abstract idea (see rejection for claim 5). The claim merely recites the additional judicial exceptions:
Step 2A Prong 1:
wherein the pre-trained model selecting unit calculates the performance evaluation score based on normalized mutual information (NMI) derived through the following equation for the result of performing the clustering: [Equation]

    PNG
    media_image3.png
    85
    360
    media_image3.png
    Greyscale

wherein Y is a random variable for a class label of the transfer learning data, K is a random variable for a cluster label, HO is entropy, and I(Y;K) is mutual information between Y and K — This limitation is directed to the abstract idea of a mathematical process, and mathematical formulas or equations in particular (MPEP 2106.04(a)(2) I. B.). The claim explicitly recites a mathematical formula.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 8
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 1:
further comprising: a transfer learning model output unit configured to calculate classification performance accuracy for the one or more generated transfer learning models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a model performance.
and select […] one or more transfer learning models in the order of the highest classification performance accuracy — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options to pick the best one.
Step 2A Prong 2:
and output one or more transfer learning models in the order of the highest classification performance accuracy — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. 
Step 2B:
The additional elements as identified in step 2A prong 2:
and output one or more transfer learning models in the order of the highest classification performance accuracy — This limitation is recited at a high level of generality and amounts to mere data gathering of transmitting and receiving data over a network, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 9
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The claim recites the additional limitations:
Step 2A Prong 1:
wherein the transfer learning model output unit generates and outputs a final transfer learning model y(x) by configuring an ensemble by the following equation for the one or more selected transfer learning models: [Equation]

    PNG
    media_image4.png
    91
    205
    media_image4.png
    Greyscale

wherein y(x) is a final transfer learning model, yt(x) is any transfer learning model among T, and wt is an ensemble weight — This limitation is directed to the abstract idea of a mathematical process, and mathematical formulas or equations in particular (MPEP 2106.04(a)(2) I. B.). The claim explicitly recites a mathematical formula.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 10
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 9 which included an abstract idea (see rejection for claim 9). The claim merely recites the additional limitations:
Step 2A Prong 1:
wherein the transfer learning model output unit first selects a first transfer learning model having the highest classification performance accuracy — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options to pick the best one based on a particular criteria.
and selects a second transfer learning model having the greatest accuracy improvement when configuring an ensemble with the first transfer learning model, to configure the ensemble — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options to pick the best one based on a particular criteria.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 11
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 10 which included an abstract idea (see rejection for claim 10). The claim recites the additional limitations:
Step 2A Prong 1:
wherein the transfer learning model output unit configures the ensemble by adding a transfer learning mode having the highest accuracy improvement, when added to a previously configured ensemble, as an ensemble member — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing a judgement of a model architecture based on a particular criteria.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.


Regarding Claim 12:
	Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
	Yes, the claim is to a process.
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites the abstract ideas of:
a pre-trained model selecting step of selecting a pre-trained model corresponding to the input transfer learning data from among the plurality of stored pre-trained models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
and a transfer learning step of generating a plurality of transfer learning models by performing transfer learning using the selected pre-trained model and the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to providing an opinion of how a machine learning model transfers to new data and a judgement of appropriate model parameters.
wherein the pre-trained model selecting step includes: generating an output of a first part or a second part of the plurality of stored pre- trained models as a feature of the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to observing an output from a model displayed on a screen and evaluating the observed output to describe the features visually seen (e.g. “there is a dog and a bird in this picture”). 
and selecting a pre-trained model by performing clustering on the transfer learning data by using the feature of the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating clustered data and the features to pick the best model from a list of options.
	Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
A transfer learning method for a deep neural network, the transfer learning method comprising: a pre-trained model storing step of storing a plurality of pre-trained models that are deep neural network models learned using a plurality of pre-training datasets — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
a transfer learning data input step of inputting transfer learning data — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).

	Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
	No, the claim does not recite additional elements which amount to significantly more than the abstract idea itself. The additional elements as identified in step 2A prong 2:
A transfer learning method for a deep neural network, the transfer learning method comprising: a pre-trained model storing step of storing a plurality of pre-trained models that are deep neural network models learned using a plurality of pre-training datasets — This limitation is recited at a high level of generality and amounts to mere data gathering of storing and retrieving information in memory, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.
a transfer learning data input step of inputting transfer learning data — This limitation is recited at a high level of generality and amounts to mere data gathering of storing and retrieving information in memory, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.

Regarding Claim 14
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 12 which included an abstract idea (see rejection for claim 12). The claim recites the additional limitations:
Step 2A Prong 1:
wherein the pre-trained model selecting step includes selecting a pre-trained model corresponding to the input user requirement or the input transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
Step 2A Prong 2:
further comprising: a user requirement input step of inputting a user requirement — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. 
Step 2B:
The additional elements as identified in step 2A prong 2:
further comprising: a user requirement input step of inputting a user requirement — This limitation is recited at a high level of generality and amounts to mere data gathering of transmitting and receiving data over a network, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 15
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 14 which included an abstract idea (see rejection for claim 14). The claim recites the additional abstract ideas:
Step 2A Prong 1:
wherein the pre-trained model selecting step includes: generating an output of a first part or a second part of the pre-trained model corresponding to the user requirement as a feature of the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating given data or outputs to determine a result. The limitation could be performed in the human mind by, for example, performing a series of matrix multiplications in the human mind or using pen and paper.
and selecting a pre-trained model by performing clustering on the transfer learning data by using the feature of the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.


Regarding Claim 16
Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 12 which included an abstract idea (see rejection for claim 12). The claim recites the additional limitations:
Step 2A Prong 1:
further comprising: a transfer learning model output step of calculating classification performance accuracy for the plurality of generated transfer learning models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a model performance.
and selecting […] one or more transfer learning models in the order of the highest classification performance accuracy among the plurality of generated transfer learning models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options to pick the best one.
Step 2A Prong 2:
and outputting one or more transfer learning models in the order of the highest classification performance accuracy among the plurality of generated transfer learning models — This limitation is directed to mere data gathering and outputting which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. 
Step 2B:
The additional elements as identified in step 2A prong 2:
and outputting one or more transfer learning models in the order of the highest classification performance accuracy among the plurality of generated transfer learning models — This limitation is recited at a high level of generality and amounts to mere data gathering of transmitting and receiving data over a network, which is well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.), which cannot amount to significantly more than the judicial exception.
Thus, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 17:
	Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
	Yes, the claim is to a process.
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites the abstract ideas of:
A transfer learning method for a deep neural network, the transfer learning method comprising: a pre-trained model selecting step of selecting a pre-trained model corresponding to transfer learning data from among a plurality of stored pre-trained models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options based on given data.
a transfer learning step of generating a plurality of transfer learning models by performing transfer learning using the selected pre-trained model and the transfer learning data — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to providing an opinion of how a machine learning model transfers to new data and a judgement of appropriate model parameters.
and a step of configuring an ensemble by selecting at least some of the plurality of transfer learning models — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options to make choices.
wherein the step of configuring the ensemble includes configuring an ensemble by adding a transfer learning model having the highest accuracy among the plurality of transfer learning models and a transfer learning model that contributes the most to improvement of accuracy of the transfer learning model — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing a judgement of a model architecture based on a particular criteria.
and sequentially adding a transfer learning model that contributes the most to the improvement of the accuracy of the pre-configured ensemble to the ensemble until a preset accuracy is satisfied — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing a judgement of a model architecture based on a particular criteria and repeated evaluation of the particular criteria.
	Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
	No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The claim does not recite any elements beyond the judicial exception alone.
	Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
	No, the claim does not recite additional elements which amount to significantly more than the abstract idea itself. The claim does not recite any elements beyond the judicial exception alone.


Regarding Claim 18
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 17 which included an abstract idea (see rejection for claim 17). The claim merely recites the additional abstract ideas
Step 2A Prong 1:
wherein the step of configuring the ensemble includes: first selecting a first transfer learning model having a highest classification performance accuracy is — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options to pick the best one based on a particular criteria.
and selecting a second transfer learning model having the greatest accuracy improvement when configuring an ensemble with the first transfer learning model — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to evaluating a set of options to pick the best one based on a particular criteria.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Regarding Claim 19
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 18 which included an abstract idea (see rejection for claim 18). The claim merely recites the additional abstract ideas:
Step 2A Prong 1:
wherein the step of configuring the ensemble includes configuring the ensemble by adding a transfer learning mode having the highest accuracy improvement, when added to a previously configured ensemble, as an ensemble member  — This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). The limitation is directed to a mental process because it amounts to performing a judgement of a model architecture based on a particular criteria.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim 1, 5, and 12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Vu et al. “Exploring and Predicting Transferability across NLP Tasks” herein referred to as Vu.

Regarding Claim 1
Vu teaches:
A transfer learning system for a deep neural network, the transfer learning system comprising: a pre-trained model storage unit configured to store a plurality of pre-trained models that are deep neural network models learned using one or more pre-training datasets; 
(page 2 column 1 paragraph 3) “We publicly release our task library, which consists of pretrained models[*Examiner notes: pretrained model storage unit] and task embeddings for the 33 NLP tasks we study[*Examiner notes: one or more pre-training datasets], along with a codebase that computes task embeddings for new tasks and identifies source tasks that will likely yield positive transferability”; (page 3 column 1 paragraph 2) “Table 1 lists the 33 datasets[*Examiner notes: pre-training datasets] in our study.”

a transfer learning data input unit configured to receive transfer learning data; 
(page 3 column 1 paragraph 2) “Table 1 lists the 33 datasets in our study.7 We select these datasets by mostly following prior work: nine of the eleven CR tasks come from the GLUE benchmark (Wang et al., 2019b); all eleven QA tasks are from the MultiQA repository (Talmor and Berant, 2019b); and all eleven SL tasks were used by Liu et al. (2019a). We consider all possible pairs of source and target datasets[*Examiner notes: target dataset is the received transfer learning data];”

a pre-trained model selecting unit configured to select a pre-trained model corresponding to the transfer learning data from among the plurality of stored pre-trained models
(page 1 column 2 figure 1 caption) “Given a target task, we first compute its task embedding and then identify the most similar source task embedding (in this example, WikiHop) from a precomputed library via cosine similarity[*Examiner notes: select a pre-trained model].”; (page 7 column 2 last paragraph) “TASKEMB can substantially boost the quality of the rankings, frequently outperforming the other methods across different classes of problems, data regimes, and transfer scenarios. These results demonstrate that the task similarity between the computed embeddings is a robust predictor of effective transfer.”

and a transfer learning unit configured to generate one or more transfer learning models by performing transfer learning using the selected pre-trained model and the transfer learning data
(page 2 column 1 last paragraph) “In each experiment, we follow the STILTs pipeline of Phang et al. (2018) by taking a pretrained BERT model,5 fine-tuning it on an intermediate source task, and then fine-tuning the resulting model on a target task[*Examiner notes: performing transfer learning using selected model and the transfer learning data].”; Figure 1


    PNG
    media_image5.png
    501
    943
    media_image5.png
    Greyscale


wherein the pre-trained model selecting unit derives a feature of the transfer learning data from an output of a first part or a second part of the plurality of stored pre-trained models 
(page 6 column 1 last paragraph) “To begin, we fine-tune BERT on the training dataset of a given task[*Examiner notes: plurality of stored pre-trained models]; the model without the final task-specific layer forms our feature extractor[*Examiner notes: second part]. Next, we feed the entire training dataset into the model[*Examiner notes: transfer learning data] and compute the task embedding[*Examiner notes: feature of the transfer learning data] based on the Fisher of the feature extractor’s parameters (weights) θ, i.e., the expected covariance of the gradients of the log-likelihood with respect to θ:”

and selects a pre-trained model based on clustering for the transfer leaning data using the feature of the transfer learning data.
[*Examiner notes: Figure 2 below shows the clustering of transfer learning feature data]; (page 6 column 2 paragraph 3) “Our evaluation centers around the meta-task of selecting the best source task for a given target task[*Examiner notes: selects a pre-trained model]. Specifically, given a target task, we rank all the other source tasks in our library in descending order by the cosine similarity13 between their task embeddings and the target task’s embedding[*Examiner notes: based on clustering]”; Figure 2

    PNG
    media_image6.png
    414
    384
    media_image6.png
    Greyscale


Regarding Claim 5
Vu teaches:
The transfer learning system of claim 2
(see rejection of claim 2)

Vu further teaches:
wherein the pre-trained model selecting unit performs performance evaluation on a result of performing clustering and selects the pre-trained model in the order of the highest performance evaluation score.
 [*Examiner notes: Figure 2 below shows the clustering of transfer learning feature data]; (page 6 column 2 paragraph 3) “Our evaluation centers around the meta-task of selecting the best source task for a given target task[*Examiner notes: selects a pre-trained model]. Specifically, given a target task, we rank all the other source tasks in our library in descending order by the cosine similarity13[*Examiner notes: order of highest performance] between their task embeddings and the target task’s embedding[*Examiner notes: based on clustering]”; Figure 2


    PNG
    media_image6.png
    414
    384
    media_image6.png
    Greyscale


Regarding Claim 12
Vu teaches:
A transfer learning method for a deep neural network, the transfer learning method comprising: a pre-trained model storing step of storing a plurality of pre-trained models that are deep neural network models learned using a plurality of pre-training datasets
(page 2 column 1 paragraph 3) “We publicly release our task library, which consists of pretrained models[*Examiner notes: pretrained model storage unit] and task embeddings for the 33 NLP tasks we study[*Examiner notes: one or more pre-training datasets], along with a codebase that computes task embeddings for new tasks and identifies source tasks that will likely yield positive transferability”; (page 3 column 1 paragraph 2) “Table 1 lists the 33 datasets[*Examiner notes: pre-training datasets] in our study.”


a transfer learning data input step of inputting transfer learning data;
(page 3 column 1 paragraph 2) “Table 1 lists the 33 datasets in our study.7 We select these datasets by mostly following prior work: nine of the eleven CR tasks come from the GLUE benchmark (Wang et al., 2019b); all eleven QA tasks are from the MultiQA repository (Talmor and Berant, 2019b); and all eleven SL tasks were used by Liu et al. (2019a). We consider all possible pairs of source and target datasets[*Examiner notes: target dataset is the received transfer learning data];”


 a pre-trained model selecting step of selecting a pre-trained model corresponding to the input transfer learning data from among the plurality of stored pre-trained models; 
(page 1 column 2 figure 1 caption) “Given a target task, we first compute its task embedding and then identify the most similar source task embedding (in this example, WikiHop) from a precomputed library via cosine similarity[*Examiner notes: select a pre-trained model].”; (page 7 column 2 last paragraph) “TASKEMB can substantially boost the quality of the rankings, frequently outperforming the other methods across different classes of problems, data regimes, and transfer scenarios. These results demonstrate that the task similarity between the computed embeddings is a robust predictor of effective transfer.”

and a transfer learning step of generating a plurality of transfer learning models by performing transfer learning using the selected pre-trained model and the transfer learning data
(page 2 column 1 last paragraph) “In each experiment, we follow the STILTs pipeline of Phang et al. (2018) by taking a pretrained BERT model,5 fine-tuning it on an intermediate source task, and then fine-tuning the resulting model on a target task[*Examiner notes: performing transfer learning using selected model and the transfer learning data].”; Figure 1


    PNG
    media_image5.png
    501
    943
    media_image5.png
    Greyscale


wherein the pre-trained model selecting step includes: generating an output of a first part or a second part of the plurality of stored pre- trained models as a feature of the transfer learning data 
(page 6 column 1 last paragraph) “To begin, we fine-tune BERT on the training dataset of a given task[*Examiner notes: plurality of stored pre-trained models]; the model without the final task-specific layer forms our feature extractor[*Examiner notes: second part]. Next, we feed the entire training dataset into the model[*Examiner notes: transfer learning data] and compute the task embedding[*Examiner notes: output as a feature of the transfer learning data] based on the Fisher of the feature extractor’s parameters (weights) θ, i.e., the expected covariance of the gradients of the log-likelihood with respect to θ:”

and selecting a pre-trained model by performing clustering on the transfer learning data by using the feature of the transfer learning data.
[*Examiner notes: Figure 2 below shows the clustering of transfer learning feature data]; (page 6 column 2 paragraph 3) “Our evaluation centers around the meta-task of selecting the best source task for a given target task[*Examiner notes: selects a pre-trained model]. Specifically, given a target task, we rank all the other source tasks in our library in descending order by the cosine similarity13 between their task embeddings and the target task’s embedding [*Examiner notes: based on clustering]”; Figure 2

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3-4 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Vu in view of NPL reference Lu et al. “Neural Architecture Transfer” herein referred to as Lu.

Regarding Claim 3
Vu teaches:
The transfer learning system of claim 
(see rejection of claim 1)

Vu does not explicitly teach:
further comprising: a user requirement input unit for inputting a user requirement, wherein the pre-trained model selecting unit selects a pre-trained model corresponding to the input user requirement or the input transfer learning data.

However, Lu teaches:
further comprising: a user requirement input unit for inputting a user requirement, wherein the pre-trained model selecting unit selects a pre-trained model corresponding to the input user requirement or the input transfer learning data.
(page 3 column 2 paragraph 2) “NAT starts with an archive A of architectures (subnets) created by uniform sampling from our search space. We evaluate the performance fi of each subnet (ai) using weights inherited from the supernet. The accuracy predictor is then constructed from (ai; fi) pairs which (jointly with any additional objectives provided by the user[*Examiner notes: user requirement]) drives the subsequent many-objective evolutionary search towards optimal architectures[*Examiner notes: selecting a model].”

	Vu, Lu, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu with the user requirement of Lu because (Lu page 2 column 1 last paragraph) “Finally we demonstrate the scalability and utility of NAT across many objectives. Optimizing accuracy, model size and one of MAdds, CPU or GPU latency, NATNets dominates MobileNetv3 [20] across all objectives. We also consider a 12 objective problem of finding a common architecture across eleven datasets”

Regarding Claim 4
Vu in view of Lu teaches:
The transfer learning system of claim 3
(see rejection of claim 3)

Vu further teaches:
wherein the pre-trained model selecting unit derives feature of the transfer learning data based on an output of a first part or a second part of the pre-trained model corresponding to the user requirement
(page 6 column 1 last paragraph) “To begin, we fine-tune BERT on the training dataset of a given task; the model without the final task-specific layer forms our feature extractor[*Examiner notes: second part]. Next, we feed the entire training dataset into the model[*Examiner notes: transfer learning data] and compute the task embedding[*Examiner notes: feature of the transfer learning data] based on the Fisher of the feature extractor’s parameters (weights) θ, i.e., the expected covariance of the gradients of the log-likelihood with respect to θ:”

 and selects a pre-trained model based on clustering for the transfer learning data using the feature of the transfer learning data
[*Examiner notes: Figure 2 below shows the clustering of transfer learning feature data]; (page 6 column 2 paragraph 3) “Our evaluation centers around the meta-task of selecting the best source task for a given target task[*Examiner notes: selects a pre-trained model]. Specifically, given a target task, we rank all the other source tasks in our library in descending order by the cosine similarity13 between their task embeddings and the target task’s embedding[*Examiner notes: based on clustering]”; Figure 2

    PNG
    media_image6.png
    414
    384
    media_image6.png
    Greyscale


Regarding Claim 14
Vu teaches:
The transfer learning method of claim 12
(see rejection of claim 12)

Vu does not explicitly teach:
further comprising: a user requirement input step of inputting a user requirement, wherein the pre-trained model selecting step includes selecting a pre-trained model corresponding to the input user requirement or the input transfer learning data.

However, Lu teaches:
further comprising: a user requirement input step of inputting a user requirement, wherein the pre-trained model selecting step includes selecting a pre-trained model corresponding to the input user requirement or the input transfer learning data.
(page 3 column 2 paragraph 2) “NAT starts with an archive A of architectures (subnets) created by uniform sampling from our search space. We evaluate the performance fi of each subnet (ai) using weights inherited from the supernet. The accuracy predictor is then constructed from (ai; fi) pairs which (jointly with any additional objectives provided by the user[*Examiner notes: user requirement]) drives the subsequent many-objective evolutionary search towards optimal architectures[*Examiner notes: selecting a model].”

Vu, Lu, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu with the user requirement of Lu because (Lu page 2 column 1 last paragraph) “Finally we demonstrate the scalability and utility of NAT across many objectives. Optimizing accuracy, model size and one of MAdds, CPU or GPU latency, NATNets dominates MobileNetv3 [20] across all objectives. We also consider a 12 objective problem of finding a common architecture across eleven datasets”

Regarding Claim 15
Vu in view of Lu teaches:
The transfer learning method of claim 14
(see rejection of claim 14)

Vu further teaches:
wherein the pre-trained model selecting step includes: generating an output of a first part or a second part of the pre-trained model […] as a feature of the transfer learning data; 
(page 6 column 1 last paragraph) “To begin, we fine-tune BERT on the training dataset of a given task[*Examiner notes: plurality of stored pre-trained models]; the model without the final task-specific layer forms our feature extractor[*Examiner notes: second part]. Next, we feed the entire training dataset into the model[*Examiner notes: transfer learning data] and compute the task embedding[*Examiner notes: feature of the transfer learning data] based on the Fisher of the feature extractor’s parameters (weights) θ, i.e., the expected covariance of the gradients of the log-likelihood with respect to θ:”

and selecting a pre-trained model by performing clustering on the transfer learning data by using the feature of the transfer learning data.
[*Examiner notes: Figure 2 below shows the clustering of transfer learning feature data]; (page 6 column 2 paragraph 3) “Our evaluation centers around the meta-task of selecting the best source task for a given target task[*Examiner notes: selects a pre-trained model]. Specifically, given a target task, we rank all the other source tasks in our library in descending order by the cosine similarity13 between their task embeddings and the target task’s embedding[*Examiner notes: based on clustering]”; Figure 2

    PNG
    media_image6.png
    414
    384
    media_image6.png
    Greyscale


And Lu further teaches:
corresponding to the user requirement
(page 3 column 2 paragraph 2) “NAT starts with an archive A of architectures (subnets) created by uniform sampling from our search space. We evaluate the performance fi of each subnet (ai) using weights inherited from the supernet. The accuracy predictor is then constructed from (ai; fi) pairs which (jointly with any additional objectives provided by the user[*Examiner notes: user requirement]) drives the subsequent many-objective evolutionary search towards optimal architectures[*Examiner notes: selecting a model].”

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to combine Vu with Lu for the same reasons given in claim 14 above.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Vu in view of NPL reference Rama Rao et al. “EXPLORING THE IMPACT OF OPTIMAL CLUSTERS ON CLUSTER PURITY” herein referred to as Rama Rao.

Regarding Claim 6
Vu teaches:
The transfer learning system of claim 5
(see rejection of claim 5)

Vu does not teach:
wherein the pre-trained model selecting unit calculates the performance evaluation score by purity derived through the following equation for the result of performing the clustering: [Equation]

    PNG
    media_image7.png
    91
    304
    media_image7.png
    Greyscale

wherein N is a total number of data, Nkj is the number of j-th classes in a k-th cluster
However Rama Rao teaches
wherein the pre-trained model selecting unit calculates the performance evaluation score by purity derived through the following equation for the result of performing the clustering: [Equation]

    PNG
    media_image7.png
    91
    304
    media_image7.png
    Greyscale

wherein N is a total number of data, Nkj is the number of j-th classes in a k-th cluster(page 756 column 1 section D paragraph 1) “Purity and entropy are the similar functions which used as an exterior evaluation criteria to know the cluster quality, Let x be an object which predicts and gives the cluster memberships for each sample and let y be a factor that gives the true class labels of each sample and suppose if we have given l categories as an input and if the clustering method generates k clusters, then the purity of that given clusters with respect to the known categories is given as […equation…] Where n is the total no. of samples and njq is the number of samples in the cluster q, which belongs to the class j(1≤j≤l). Purity value lies between [0], [1]. The large the purity the better the clustering performance.”

    PNG
    media_image8.png
    150
    344
    media_image8.png
    Greyscale


	Vu, Rama Rao, and the instant application are analogous because they are all directed to machine learning.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu using the clustering purity of Rama Rao because (Rama Rao page 756 column 1 above section E) “Purity value lies between [0], [1]. The large the purity the better the clustering performance” and (Rama Rao page 757 column 1 last paragraph) “Purity and Silhouette index have served as better indicators for evaluating cluster formation of this dataset.”

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Vu in view of NPL reference Newman et al. “Improved mutual information measure for clustering, classification, and community detection” herein referred to as Newman.

Regarding Claim 7
Vu teaches:
The transfer learning system of claim 5
(see rejection of claim 5)

Vu does not teach:
wherein the pre-trained model selecting unit calculates the performance evaluation score based on normalized mutual information (NMI) derived through the following equation for the result of performing the clustering: [Equation]

    PNG
    media_image9.png
    93
    342
    media_image9.png
    Greyscale

wherein Y is a random variable for a class label of the transfer learning data, K is a random variable for a cluster label, HO is entropy, and I(Y;K) is mutual information between Y and K

However Newman teaches:
wherein the pre-trained model selecting unit calculates the performance evaluation score based on normalized mutual information (NMI) derived through the following equation for the result of performing the clustering: [Equation]

    PNG
    media_image9.png
    93
    342
    media_image9.png
    Greyscale

wherein Y is a random variable for a class label of the transfer learning data, K is a random variable for a cluster label, HO is entropy, and I(Y;K) is mutual information between Y and K
(page 3 column 1 above equation 8) “In addition to increasing with the similarity of the labelings, the mutual information has the nice property of being symmetric in r and s. In some circumstances, it is convenient to normalize it so as to create a measure whose value runs between zero and one [3]. There are several ways to perform the normalization [13], but the most widely used normalizes by the mean of the entropies H(r) and H(s), which preserves the symmetry with respect to r and s:”

    PNG
    media_image10.png
    59
    411
    media_image10.png
    Greyscale


	Vu, Newman, and the instant application are analogous because they are all directed to machine learning.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu with the normalized mutual information of Newman because (Newman page 3 column 1 above equation 8) “In addition to increasing with the similarity of the labelings, the mutual information has the nice property of being symmetric in r and s. In some circumstances, it is convenient to normalize it so as to create a measure whose value runs between zero and one [3].

Claims 8-11 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Vu in view of Mustafa et al. “Deep Ensembles for Low-Data Transfer Learning” herein referred to as Mustafa.

Regarding Claim 8
Vu teaches:
The transfer learning system of claim 1
(see rejection of claim 1)

Vu does not explicitly teach
further comprising: a transfer learning model output unit configured to calculate classification performance accuracy for the one or more generated transfer learning models and select and output one or more transfer learning models in the order of the highest classification performance accuracy.

However, Mustafa teaches:
further comprising: a transfer learning model output unit configured to calculate classification performance accuracy for the one or more generated transfer learning models and select and output one or more transfer learning models in the order of the highest classification performance accuracy.
(page 3 second to last paragraph) “Pre-trained model selection. Fine-tuning all experts on the new task would be prohibitively expensive. Following Puigcerver et al. (2020), we rank all the models by their kNN leave-one-out accuracy as a proxy for final fine-tuned accuracy, instead keeping the K best models (rather than 1).”

	Vu, Mustafa, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu with the transfer learning ensemble taught by Mustafa because (Mustafa page 1 abstract) “When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift.”

Regarding Claim 9
Vu teaches:
The transfer learning system of claim 1
(see rejection of claim 1)

Vu does not teach:
wherein the transfer learning model output unit generates and outputs a final transfer learning model y(x) by configuring an ensemble by the following equation for the one or more selected transfer learning models:[Equation]

    PNG
    media_image11.png
    97
    226
    media_image11.png
    Greyscale

wherein y(x) is a final transfer learning model, yt(x) is any transfer learning model among T, and wt is an ensemble weight.

However, Mustafa teaches:
wherein the transfer learning model output unit generates and outputs a final transfer learning model y(x) by configuring an ensemble by the following equation for the one or more selected transfer learning models:[Equation]

    PNG
    media_image11.png
    97
    226
    media_image11.png
    Greyscale

wherein y(x) is a final transfer learning model, yt(x) is any transfer learning model among T, and wt is an ensemble weight.
(page 20 paragraph 1) “The extra benefit (in our view, which is not discussed in that work) is that by weighting ensembles according to the number of times they were picked, we can use a weighted average, downweighting less performant models and potentially improving performance.”; [*Examiner notes: Mustafa exactly describes the equation claimed in words]

Vu, Mustafa, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu with the transfer learning ensemble taught by Mustafa because (Mustafa page 1 abstract) “When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift” and (page 20 paragraph 1) “The extra benefit (in our view, which is not discussed in that work) is that by weighting ensembles according to the number of times they were picked, we can use a weighted average, downweighting less performant models and potentially improving performance.”


Regarding Claim 10
Vu in view of Mustafa teaches:
The transfer learning system of claim 9
(see rejection of claim 9)
Mustafa further teaches:
wherein the transfer learning model output unit first selects a first transfer learning model having the highest classification performance accuracy, 
(page 3 second to last paragraph) “Fine-tuning all experts on the new task would be prohibitively expensive. Following Puigcerver et al. (2020), we rank all the models by their kNN leave-one-out accuracy as a proxy for final fine-tuned accuracy, instead keeping the K best models (rather than 1).”

and selects a second transfer learning model having the greatest accuracy improvement when configuring an ensemble with the first transfer learning model, to configure the ensemble
(page 4 paragraph 1) “We use the greedy algorithm introduced by Caruana et al. (2004). At each step, we greedily pick the next model which minimises cross-entropy on the validation set when it is ensembled with already chosen models.”

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to combine Vu and Mustafa for the same reasons given in claim 9 above.


Regarding Claim 11
Vu in view of Mustafa teaches:
The transfer learning system of claim 10
(see rejection of claim 10)

And Mustafa further teaches:
wherein the transfer learning model output unit configures the ensemble by adding a transfer learning mode having the highest accuracy improvement, when added to a previously configured ensemble, as an ensemble member.
(page 4 paragraph 1) “We use the greedy algorithm introduced by Caruana et al. (2004). At each step, we greedily pick the next model which minimises cross-entropy on the validation set when it is ensembled with already chosen models.”

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to combine Vu and Mustafa for the same reasons given in claim 9 above.


Regarding Claim 16
Vu teaches:
The transfer learning method of claim 12
(see rejection of claim 12)

Vu does not explicitly teach:
further comprising: a transfer learning model output step of calculating classification performance accuracy for the plurality of generated transfer learning models, and selecting and outputting one or more transfer learning models in the order of the highest classification performance accuracy among the plurality of generated transfer learning models

However, Mustafa teaches:
further comprising: a transfer learning model output step of calculating classification performance accuracy for the plurality of generated transfer learning models, and selecting and outputting one or more transfer learning models in the order of the highest classification performance accuracy among the plurality of generated transfer learning models
(page 3 second to last paragraph) “Pre-trained model selection. Fine-tuning all experts on the new task would be prohibitively expensive. Following Puigcerver et al. (2020), we rank all the models by their kNN leave-one-out accuracy as a proxy for final fine-tuned accuracy, instead keeping the K best models (rather than 1).”

	Vu, Mustafa, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu with the transfer learning ensemble taught by Mustafa because (Mustafa page 1 abstract) “When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift.”

Regarding Claim 17
Vu teaches:
A transfer learning method for a deep neural network, the transfer learning method comprising: a pre-trained model selecting step of selecting a pre-trained model corresponding to transfer learning data from among a plurality of stored pre-trained models
(page 1 column 2 figure 1 caption) “Given a target task, we first compute its task embedding and then identify the most similar source task embedding (in this example, WikiHop) from a precomputed library via cosine similarity[*Examiner notes: select a pre-trained model].”; (page 7 column 2 last paragraph) “TASKEMB can substantially boost the quality of the rankings, frequently outperforming the other methods across different classes of problems, data regimes, and transfer scenarios. These results demonstrate that the task similarity between the computed embeddings is a robust predictor of effective transfer.”


a transfer learning step of generating a plurality of transfer learning models by performing transfer learning using the selected pre-trained model and the transfer learning data; 
(page 2 column 1 last paragraph) “In each experiment, we follow the STILTs pipeline of Phang et al. (2018) by taking a pretrained BERT model,5 fine-tuning it on an intermediate source task, and then fine-tuning the resulting model on a target task[*Examiner notes: performing transfer learning using selected model and the transfer learning data].”; Figure 1


    PNG
    media_image5.png
    501
    943
    media_image5.png
    Greyscale



Vu does not teach:
and a step of configuring an ensemble by selecting at least some of the plurality of transfer learning models.

However, Mustafa teaches:
and a step of configuring an ensemble by selecting at least some of the plurality of transfer learning models
(page 1 abstract) “The approach is simple: Use nearest-neighbour accuracy to rank pre-trained models, fine-tune the best ones with a small hyperparameter sweep, and greedily construct an ensemble to minimise validation cross-entropy.”

wherein the step of configuring the ensemble includes configuring an ensemble by adding a transfer learning model having the highest accuracy among the plurality of transfer learning models 
(page 3 second to last paragraph) “Fine-tuning all experts on the new task would be prohibitively expensive. Following Puigcerver et al. (2020), we rank all the models by their kNN leave-one-out accuracy as a proxy for final fine-tuned accuracy, instead keeping the K best models (rather than 1).”

and a transfer learning model that contributes the most to improvement of accuracy of the transfer learning model: 
(page 4 paragraph 1) “We use the greedy algorithm introduced by Caruana et al. (2004). At each step, we greedily pick the next model which minimises cross-entropy on the validation set when it is ensembled with already chosen models.”

and sequentially adding a transfer learning model that contributes the most to the improvement of the accuracy of the pre-configured ensemble to the ensemble 
(page 4 paragraph 1) “We use the greedy algorithm introduced by Caruana et al. (2004). At each step, we greedily pick the next model which minimises cross-entropy on the validation set when it is ensembled with already chosen models.”

until a preset accuracy is satisfied.
(page 4 section 2.3.1) “We aim to dynamically set this balance per-dataset using a heuristic based on the kNN accuracies; namely, we keep all pre-trained models within some threshold percentage τ % of the top kNN[*Examiner notes: preset accuracy is satisfied] accuracy, up to a maximum of K = 15.”

Vu, Mustafa, and the instant application are analogous because they are all directed to neural networks.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to modify the transfer learning of Vu with the transfer learning ensemble taught by Mustafa because (Mustafa page 1 abstract) “When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift.”


Regarding Claim 18
Vu in view of Mustafa teaches:
The transfer learning method of claim 17
(see rejection of claim 17)

Mustafa further teaches:
wherein the step of configuring the ensemble includes: first selecting a first transfer learning model having a highest classification performance accuracy is
(page 3 second to last paragraph) “Fine-tuning all experts on the new task would be prohibitively expensive. Following Puigcerver et al. (2020), we rank all the models by their kNN leave-one-out accuracy as a proxy for final fine-tuned accuracy, instead keeping the K best models (rather than 1).”

and selecting a second transfer learning model having the greatest accuracy improvement when configuring an ensemble with the first transfer learning model
(page 4 paragraph 1) “We use the greedy algorithm introduced by Caruana et al. (2004). At each step, we greedily pick the next model which minimises cross-entropy on the validation set when it is ensembled with already chosen models.”

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to combine Vu and Mustafa for the same reasons given in claim 17 above.

Regarding Claim 19
Vu in view of Mustafa teaches:
The transfer learning method of claim 18
(see rejection of claim 18)

Mustafa further teaches:
wherein the step of configuring the ensemble includes configuring the ensemble by adding a transfer learning mode having the highest accuracy improvement, when added to a previously configured ensemble, as an ensemble member.
(page 4 paragraph 1) “We use the greedy algorithm introduced by Caruana et al. (2004). At each step, we greedily pick the next model which minimises cross-entropy on the validation set when it is ensembled with already chosen models.”

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the present invention to combine Vu and Mustafa for the same reasons given in claim 17 above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Hackett (US20170344903A1) teaches using accuracy as a termination condition when adding new models to an ensemble (see fig. 6 box 614).
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ezra J Baker whose telephone number is (703)756-1087. The examiner can normally be reached Monday - Friday 10:00 am - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/E.J.B./Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Oct 06, 2022
Application Filed
Nov 08, 2022
Response after Non-Final Action
Oct 16, 2025
Non-Final Rejection mailed — §101, §102, §103
Jan 16, 2026
Response Filed
Mar 09, 2026
Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/863,840
Patent 12619886
Frozen Model Adaptation Through Soft Prompt Transfer
3y 9m to grant Granted May 05, 2026
17/559,159
Patent 12608619
SUPERSEDED FEDERATED LEARNING
4y 4m to grant Granted Apr 21, 2026
17/455,252
Patent 12585964
EXHAUSTIVE LEARNING TECHNIQUES FOR MACHINE LEARNING ALGORITHMS
4y 4m to grant Granted Mar 24, 2026
17/475,901
Patent 12579477
FEATURE SELECTION USING FEEDBACK-ASSISTED OPTIMIZATION MODELS
4y 6m to grant Granted Mar 17, 2026
17/460,373
Patent 12505379
COMPUTER-READABLE RECORDING MEDIUM STORING MACHINE LEARNING PROGRAM, MACHINE LEARNING METHOD, AND INFORMATION PROCESSING DEVICE OF IMPROVING PERFORMANCE OF LEARNING SKIP IN TRAINING MACHINE LEARNING MODEL
4y 3m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
50%
Grant Probability
99%
With Interview (+53.3%)
4y 0m (~4m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allowance rate.