Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-10 are pending. This Office Action is responsive to the amendment filed on 07/25/2025, which has been entered into the above identified application.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 5-6 and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Achille et al. (“TASK2VEC: Task Embedding for Meta-Learning”, presented 10/31/2019), hereinafter Achille; in view of Yuan et al. (US 20220284321 A1, filed 03/03/2021), hereinafter Yuan; in further view of Cai et al. (“Once-for-All: Train One Network and Specialize it for Efficient Deployment”, published 04/29/2020), hereinafter Cai.
Regarding Claim 1, Achille teaches a database including a learning model pool comprising of a plurality of datasets and neural networks pre-trained on the plurality of datasets (Achille: “For model selection experiments, we assemble a library of “expert” feature extractors. These are ResNet-34 models pre-trained on ImageNet and then fine-tuned on a specific task or collection of related tasks (see Supplementary Materials for details). We also consider a “generic” expert pre-trained on ImageNet without any fine-tuning. Finally, for each combination of expert feature extractor and task,
we trained a linear classifier on top of the expert in order to solve the selected task using the expert.” [Section 4. Experiments]) and a program for task-adaptive neural network searching based on meta-learning (Achille: “To select an appropriate pre-trained model, we design a joint embedding of models and tasks in the same vector space, which we call MODEL2VEC. We formulate this as a meta-learning problem where the objective is to find an embedding such that that models whose embeddings are close to a task exhibit good performance on that task.” [Section 1. Introduction]);
learn a cross-modal latent space for the plurality of datasets and neural networks trained on the plurality of datasets by calculating a similarity between each dataset and a neural network trained on the dataset while considering constraints included in any one task previously selected from the database having a distribution of tasks, thereby searching an optimal neural network (Achille: “Our task embedding can be used to reason about the space of tasks and solve meta-tasks. As a motivating example, we study the problem of selecting the best pre-trained feature extractor to solve a new task (Sect. 4). This is particularly valuable when there is insufficient data to train or fine-tune a generic model, and transfer of knowledge is essential. To select an appropriate pre-trained model, we design a joint embedding of models and tasks in the same vector space, which we call MODEL2VEC.” [Section 1. Introduction]; “The norm of the embedding correlates with the complexity of the task, while the distance between embeddings captures semantic similarities between tasks (Fig. 1).” [Section 1. Introduction]; “Given k models, their MODEL2VEC embedding are the vectors mi = Fi + bi, where Fi is the task embedding of the task used to train model mi (if available, else we set it to zero), and bi is a learned “model bias” that perturbs the task embedding to account for particularities of the model. We learn bi by optimizing a k-way cross entropy loss to predict the best model given the task distance (See Supplementary Material): [Equation].” [Section 3.4. Model2Vec: Co-embedding models and tasks]);
utilize functional embedding data representing parameter data of the neural network learned for the any one tasks (Achille: “Computation of the embedding leverages a duality between parameters (weights) and outputs (activations) in a deep neural network (DNN). Just as the activations of a DNN trained on a complex visual recognition task are a rich representation of the input images, we show that the gradients of the weights relative to a task-specific loss are a rich representation of the task itself. Given a task defined by a dataset
D
=
{
x
i
,
y
i
}
i
=
1
N
of labeled samples, we feed the data through a pre-trained reference convolutional neural network which we call a “probe network”, and compute the diagonal Fisher Information Matrix (FIM) of the network filter parameters to capture the structure of the task (Sect. 3).” [Section 1. Introduction]);
input the functional embedding data into a model encoder, which converts the functional embedding data into vector data to generate neural network representation vector data (Achille: “Given k models, their MODEL2VEC embedding are the vectors mi = Fi + bi, where Fi is the task embedding of the task used to train model mi (if available, else we set it to zero), and bi is a learned “model bias” that perturbs the task embedding to account for particularities of the model.” [Section 3.4. Model2Vec: Co-embedding models and tasks]);
input a query dataset into a query encoder to output query embedding data (Achille: “Given a task defined by a dataset
D
=
{
x
i
,
y
i
}
i
=
1
N
of labeled samples, we feed the data through a pre-trained reference convolutional neural network which we call a “probe network”, and compute the diagonal Fisher Information Matrix (FIM) of the network filter parameters to capture the structure of the task (Sect. 3).” [Section 1. Introduction]);
input both the neural network representation vector data and the query embedding data into a performance predictor, which is connected to the model encoder and the query encoder to predict expected performance of the neural network for the query dataset (Achille: “Given a task, our aim is to select an expert feature extractor that maximizes the classification performance on that task. We propose two strategies: (1) embed the task and select the feature extractor trained on the most similar task, and (2) jointly embed the models and tasks, and select a model using the learned metric (see Section 3.4).” [Section 4.2 Model Selection]),
when an unseen dataset, which is a new dataset not used for training, is given as the query dataset, output a neural network having task-relevant initial parameter data for the unseen dataset by performing meta-learning on the cross-modal latent space for the neural network representation data and the query embedding data (Achille: “As a motivating example, we study the problem of selecting the best pre-trained feature extractor to solve a new task (Sect. 4).” [Section 1. Introduction]; “Given k models, their MODEL2VEC embedding are the vectors mi = Fi + bi, where Fi is the task embedding of the task used to train model mi (if available, else we set it to zero), and bi is a learned “model bias” that perturbs the task embedding to account for particularities of the model.” [Section 3.4. Model2Vec: Co-embedding models and tasks]; “The FIM is a Riemannian metric on the space of probability distributions [7], and provides a measure of the information a particular parameter (weight or feature) contains about the joint distribution
p
w
x
,
y
=
p
w
(
y
|
x
)
p
^
(
x
)
. If the classification performance for a given task does not depend strongly on a parameter, the corresponding entries in the FIM will be small.” [Section 3. Task Embeddings via Fisher Information]),
searching for a neural network trained on a dataset similar to the unseen dataset given as the query dataset through the distribution of tasks in the database including the learning model pool comprising neural networks trained on the plurality of datasets (Achille: “After training, given a novel query task t, we can then predict the best model for it as the
arg
m
i
n
i
d
a
s
y
m
(
t
,
m
i
)
, that is, the model mi embedded closest to the query task.” [Section 3.4. Model2Vec: Co-embedding models and tasks]), and
predicting expected performance of the neural network for the unseen dataset without training (Achille: “We present a simple meta-learning framework for learning a metric on embeddings that is capable of predicting which feature extractors will perform well on which task without actually fine-tuning the model.” [Abstract]).
However, Achille fails to expressly disclose memory configured to store a database and also store a program for searching based on meta-contrastive learning; and at least one processor configured to perform task-adaptive neural network searching based on meta-contrastive learning by executing the program; wherein the at least one processor is further configured to: combine network topology data representing an architecture of the neural network; input the network topology data into a model encoder; and perform meta-contrast learning on the cross-modal latent space.
In the same field of endeavor, Yuan teaches memory configured to store a database and also store a program for searching based on meta-contrastive learning (Yuan: “Self-supervised methods utilize contrastive objectives, for instance, comparison to facilitate image representation learning. For example, use of a memory bank which stores pre-computed representations and the noise-contrastive estimation (NCE) for a large number of instance classes.” [0082]);
at least one processor configured to perform searching based on meta-contrastive learning by executing the program (Yuan: “the training tasks are based on contrastive learning techniques. The visual representation learning system may used in a variety of applications including image search applications.” [0020]; “these operations are performed by a system including a processor executing a set of codes to control functional elements of an apparatus.” [0091]); wherein the at least one processor is further configured to:
perform meta-contrast learning on the cross-modal latent space (Yuan: “At step 210, the text query is encoded into a common embedding space (i.e., the same embedding space as the images stored on the database). For example, the text query may be encoded with a text encoder that is part of a multi-modal representation network that also includes an image encoder. The text encoder and the image encoder may be jointly trained using multi-modal contrastive learning techniques.” [0039]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated memory configured to store a database and also store a program for searching based on meta-contrastive learning; and at least one processor configured to perform task-adaptive neural network searching based on meta-contrastive learning by executing the program; wherein the at least one processor is further configured to: perform meta-contrast learning on the cross-modal latent space, as taught by Yuan to the system of Achille because both of these systems are directed towards representation learning for information searching and retrieval within a cross-modal latent space. In making this combination and employing contrastive learning techniques, it would allow the system of Achille to use contrastive loss to distinguish positive pairs from negative pairs, allowing for “improved accuracy” in “multi-modal search tasks” (Yuan: [0023], [0104]).
Achille and Yuan still fail to expressly disclose combining network topology data representing an architecture of the neural network; and inputting the network topology data into a model encoder.
In the same field of endeavor, Cai teaches combining network topology data representing an architecture of the neural network and functional embedding data (Cai: “Given a model, we encode each layer in the neural network into a one-hot vector based on its kernel size and expand ratio, and we assign zero vectors to layers that are skipped. Besides, we have an additional one-hot vector that represents the input image size. We concatenate these vectors into a large vector that represents the whole neural network architecture and input image size, which is then fed to the three-layer feedforward neural network to get the predicted accuracy.” [Supplementary Section A. Details of the Accuracy Predictor]; “we propose to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling training and search, to reduce the cost. We can quickly get a specialized sub-network by selecting from the OFA network without additional training.” [Abstract]; In light of page 18, lines 16-19 of the specification, which states “Meanwhile, the topology information may refer to once-for-all (OFA) topology information indicative of a kernel size, channel extension information, and the depth of layers”, BRI of “topology data” encompasses information on the architecture of a network, including kernel size, channel extension information, and layer depth); and
inputting the network topology data into a model encoder (Cai: “Given a model, we encode each layer in the neural network into a one-hot vector based on its kernel size and expand ratio, and we assign zero vectors to layers that are skipped. Besides, we have an additional one-hot vector that represents the input image size. We concatenate these vectors into a large vector that represents the whole neural network architecture and input image size, which is then fed to the three-layer feedforward neural network to get the predicted accuracy.” [Supplementary Section A. Details of the Accuracy Predictor]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated combining network topology data representing an architecture of the neural network; and inputting the network topology data into a model encoder, as taught Cai to the system of Achille and Yuan because both of these systems are directed towards retrieval of an optimal neural network to be applied to a new case without needing to train the model from scratch. In making this combination and utilizing information on the architecture/topology of the neural network when encoding the neural network into a vector representation to be mapped into a latent space, it would allow the system of Achille and Yuan to “enable a much more diverse architecture space (depth, width, kernel size, and resolution) and a significantly larger number of architectural settings”, allowing for the derivation of “new specialized neural networks for many different deployment scenarios rather than working on top of an existing neural network that limits the optimization headroom” (Cai: [Section 2. Related Work]).
Regarding Claim 2, Achille, Yuan and Cai teach the apparatus of Claim 1, wherein the at least one processor is further configured to construct the cross-modal latent space for the plurality of datasets and the neural networks trained on the plurality of datasets for the distribution of tasks by using Equation 1 below:
max
θ
,
φ
∑
τ
∈
p
(
τ
)
f
(
q
,
m
)
;
q
=
Q
D
τ
;
θ
and
m
=
M
N
τ
;
ϕ
;
Q
:
Q
→
R
d
where
Q
:
Q
→
R
d
is a query encoder,
M
:
M
→
R
d
is a model encoder, and
f
:
R
d
×
R
d
→
R
is a scoring function for a query-model pair (Achille: “To select an appropriate pre-trained model, we design a joint embedding of models and tasks in the same vector space, which we call MODEL2VEC. We formulate this as a meta-learning problem where the objective is to find an embedding such that that models whose embeddings are close to a task exhibit good performance on that task.” [Section 1. Introduction]; “Given a task, our aim is to select an expert feature extractor that maximizes the classification performance on that task. We propose two strategies: (1) embed the task and select the feature extractor trained on the most similar task, and (2) jointly embed the models and tasks, and select a model using the learned metric (see Section 3.4).” [Section 4.2. Model Selection]; “Given k models, their MODEL2VEC embedding are the vectors mi = Fi + bi, where Fi is the task embedding of the task used to train model mi (if available, else we set it to zero), and bi is a learned “model bias” that perturbs the task embedding to account for particularities of the model.” [Section 3.4. Model2Vec: Co-embedding models and tasks]).
Regarding Claims 5-6 and 9-10, they are method, non-transitory computer-readable storage medium and apparatus claims that correspond to Claims 1 and 2. Therefore, they are rejected for the same reasons as Claims 1 and 2 above.
Claims 3-4 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Achille in view of Yuan and Cai, as applied to Claims 2 and 6 above, in further view of Iakovleva et al. (“Meta-Learning with Shared Amortized Variational Inference”, published 08/27/2020), hereinafter Iakovleva.
Regarding Claim 3, Achille, Yuan and Cai teach the apparatus of Claim 2, wherein the at least one processor is further configured to retrieve a neural network by learning the cross-modal latent space using meta-learning that maximizes a similarity between a positive embedding pair of a neural network for the any one task previously selected and minimizes a similarity between a negative embedding pair thereof (Yuan: “The multi-modal representation apparatus 115 receives input information (i.e., text or an image), encodes the information to represent features of the input, and uses the feature information to enable a cross-modal search operation. In some cases, visual and textual features are embedded into a common embedding space so that, for example, embedded images that.” [0029]; “Image-text contrastive loss 535 is based on a distance between a query image and a positive or negative sample of encoded text. For example, a low image-text contrastive loss 535 indicates that a query image may be similar to a phrase of text, and a high image-text contrastive loss 535 indicates that an image may be similar to a phrase of text, based on the associated encoded query image and the encoded text.” [0073]).
However, they fail to expressly disclose learning the latent space using amortized meta-learning.
In the same field of endeavor, Iakovleva teaches learning the latent space using amortized meta-learning (Iakovleva: “We propose a novel amortized variational inference scheme for an empirical Bayes meta-learning model, where model parameters are treated as latent variables. We learn the prior distribution over model parameters conditioned on limited training data using a variational autoencoder approach. Our framework proposes sharing the same amortized inference network between the conditional prior and variational posterior distributions over the model parameters. While the posterior leverages both the labeled support and query data, the conditional prior is based only on the labeled support data.” [Abstract]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated learning the latent space using amortized meta-learning, as taught by Iakovleva to the system of Achille, Yuan and Cai because both of these systems are directed towards task-adaptive meta-learning. In making this combination and applying amortized inference to the meta-learning, it would allow the system of Achille, Yuan and Cai to “make a single feed-forward pass through data to estimate a distribution on the parameters, instead of multiple passes to update the parameters” (Iakovleva: [Section 2. Related Work]).
Regarding Claim 4, Achille, Yuan, Cai and Iakovleva teach the apparatus of Claim 3, wherein the at least one processor is further configured to calculate a similarity between each dataset and a neural network trained on the dataset by calculating a meta- contrastive learning loss using Equation 2 below:
L
m
τ
;
θ
,
φ
=
L
f
q
,
m
+
,
f
q
,
m
-
;
θ
,
φ
;
q
=
Q
D
τ
;
θ
,
m
+
=
M
N
τ
;
φ
,
m
-
M
(
N
γ
;
φ
)
(Achille: “Given a task, our aim is to select an expert feature extractor that maximizes the classification performance on that task. We propose two strategies: (1) embed the task and select the feature extractor trained on the most similar task, and (2) jointly embed the models and tasks, and select a model using the learned metric (see Section 3.4).” [Section 4.2. Model Selection]; Yuan: “At operation 305, the system encodes text using a text encoder to produce encoded text, where the image encoder and the text encoder are jointly trained based on a multi-modal loss function including at least one image loss term, at least one text loss term, and at least one cross-modal term. For example, the multi-modal loss function may be based on contrastive learning techniques.” [0045]; “Image-text contrastive loss 535 is based on a distance between a query image and a positive or negative sample of encoded text. For example, a low image-text contrastive loss 535 indicates that a query image may be similar to a phrase of text, and a high image-text contrastive loss 535 indicates that an image may be similar to a phrase of text, based on the associated encoded query image and the encoded text.” [0073]).
Regarding Claims 7 and 8, they are method claims that correspond to Claims 3 and 4. Therefore, they are rejected for the same reasons as Claims 3 and 4 above.
Response to Arguments
Examiners acknowledges the Applicant’s amendments to Claims 1-8 and 10.
Applicant’s arguments, filed 07/25/2025, regarding the objections to the specification have been fully considered and are persuasive. The objections have been withdrawn.
Applicant’s arguments, filed 07/25/2025, regarding the objections to Claim 4 have been fully considered and are persuasive. The objection has been withdrawn.
Applicant’s arguments, filed 07/25/2025, traversing the rejection of Claim 1-10 under 35 U.S.C. § 101 have been fully considered and are persuasive. The rejection has been withdrawn.
Applicant’s arguments, filed 07/25/2025, regarding the rejection of Claims 1-10 under 35 U.S.C. § 103 have been fully considered and are found moot in light of the new grounds of rejection (see rejection above).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Song et al. (“Sequential Learning for Cross-modal Retrieval”) discusses designing a meta-learner trained on multi-tasks for cross-modal retrieval to optimize a model for performance on new tasks.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEGAN E HWANG whose telephone number is (703)756-1377. The examiner can normally be reached Monday-Friday 10:00-7:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.E.H./Examiner, Art Unit 2143
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143