Last updated: April 19, 2026
Application No. 17/466,195
ONLINE CONTINUAL LEARNING SYSTEM AND METHOD

Non-Final OA §103
Filed
Sep 03, 2021
Examiner
SPRAUL III, VINCENT ANTON
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Naver Corporation
OA Round
3 (Non-Final)
Interview Optional

— +34.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 34 resolved cases, 2023–2026
Examiner Intelligence

SPRAUL III, VINCENT ANTON View full profile →
Grants 59% of resolved cases
Career Allow Rate
20 granted / 34 resolved
+3.8% vs TC avg
Strong +35% interview lift
Without
With
+34.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
22.6%
-17.4% vs TC avg
§103
48.4%
+8.4% vs TC avg
§102
9.1%
-30.9% vs TC avg
§112
14.4%
-25.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 34 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/15/2025 has been entered.
 
Response to Arguments
Regarding rejections of claims under 35 U.S.C. 112(b), Applicant’s amendments have overcome the rejections in the previous office action, which are withdrawn. 

	Regarding rejections of claims under 35 U.S.C. 103, Applicant argues that the combination of Yoon and Pelosin do not teach the limitations of amended claims 1 and 24. Although the rejection below relies upon a different combination of references, some of the arguments are still relevant and will be discussed here.
	In particular, Applicant argues that Yoon as modified by Pelosin fails to teach or suggest “selecting a set of samples from embedded samples, that is, samples including the new sample embedded by an encoder during training of the machine learning model, for storage based on the claimed selection criteria,” arguing that “In Yoon's online coreset selection (OCS) method, a coreset is initialized for a current task, a batch is randomly sampled from a replay buffer, a coreset selection is made, the update is modeled with the selected instance, and the selected coreset can then be memorized in the replay buffer. Thus, Yoon, before training, considers minibatch similarity between the gradient vector of a data point and its minibatch, and additionally may compare the diversity of each data point as a negative averaged similarity with other peer instances in the same minibatch, to decide whether a particular sample is to be used for training.”
Examiner respectfully disagrees. The relevant limitations in claim 1 are “defining a set of combined samples that includes the sample received from the stream of samples and the set of previous samples accessed from the memory,” “training the machine learning model using the set of combined samples on the classification task.”  Yoon describes an iterative training process using a memory of previous samples selected by particular criteria (Yoon, section 4.1, paragraph 3, “After the completion of each task training, we choose a coreset Ct among the collected candidates, or we may also iteratively update Ct for continual learning as described in the following subsection,” emphasis added). The purpose of the replay buffer in Yoon is the same as in the present invention, to provide samples for further training at later iterative steps. Therefore Yoon teaches “defining a set of combined samples that includes the sample received from the stream of samples and the set of previous samples accessed from the memory” and “training the machine learning model using the set of combined samples on the classification task.”
	Applicant further submits the combination of references do not teach the limitations of the claim because “Yoon teaches away from using a new sample for training before coreset selection.”
	Examiner respectfully disagrees. Yoon does describe that it performs coreset selection prior to each task training (Yoon, section 2, paragraph 1, “However, the existing rehearsal-based methods [32, 2, 1, 9, 10] do not select the coreset before the current task adaptation and update the model on all the arriving data streams, which makes them susceptible to real-world applications that include noisy and imbalanced data distributions. In contrast, OCS selects the instances before updating the model using our proposed selection criteria, which makes it robust to past and current task training across various CL scenarios”). However, the language of the limitation is: “training the machine learning model using the set of combined samples on the classification task.” No particular method of training is described, nor any particular use for the set of combined samples. Yoon uses a set of combined samples to create a coreset, and then the coreset is used to update model parameters according to a particular objective function. Hence, Yoon is “using the set of combined samples” for training a machine learning model.
	These arguments are therefore unpersuasive.
	Applicant’s further clarifications and arguments regarding the teaching of “maximizes a sum of distances across pairs of samples” by the combination of Yoon and Pelosin are persuasive, and new grounds of rejection using a different combination of references are given below.

Drawings
Amendments to the drawings filed 09/15/2025 have not resolved all of Examiner’s previous objections. As an illustration of the objection, Examiner presents this detail from Fig. 1 of the amended drawings. Here, the shading in the boxes has been removed and the box-text and box outlines appear to be black, which solves a readability problem from the original. However, the text alongside the lines between the boxes is still the original non-black color which causes it to be rendered as a dither which prevents full legibility, especially for subscripted and superscripted character marks:

    PNG
    media_image1.png
    251
    405
    media_image1.png
    Greyscale


Examiner respectfully recommends rendering all text in all drawings in 100% black.
In the drawings, Figs. 1, 2A, 2B, and 5-7 are objected to because they appear to be grayscale reproductions of color illustrations that renders the text difficult to discern. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-7, 10-13, 16-17, 19-25, 27-29, and 31-32 rejected under 35 U.S.C. 103 over Yoon et al., “Online Coreset Selection for Rehearsal-based Continual Learning,” 2001, arXiv:2106.01085v1 (hereafter Yoon) in view of Gonzalez et al., “Faster Training by Selecting Samples Using Embeddings,” 2019, 10.1109/IJCNN.2019.8851717 (hereafter Gonzalez).

Regarding claim 1:
	Yoon teaches:
(bold only) “A method for processing a stream of samples in an online learning system for updating a machine learning model configured for performing a classification task, the method being implemented by a processor in communication with a memory, the method comprising”: Yoon, Abstract, “To tackle this problem, we propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration and trains them in an online manner [A method for processing a stream of samples in an online learning system for updating a machine learning model]”; Yoon, section 5.1, “Datasets,” “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST [performing a classification task] using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29]”; Yoon, Algorithm 1, “input Dataset {Dt}Tt=1, neural network fΘ[implemented by a processor],  learning rate n, hyperparameters λ,  τ, replay buffer C <- {} [in communication with a memory].”
“accessing from samples previously stored in the memory a set of previous samples for training the machine learning model for performing the classification task”: Yoon, section 4, paragraph 1, “In this section, we introduce our selection strategies and propose Online Coreset Selection (OCS) to strengthen current task adaptation and mitigate catastrophic forgetting. Thus far, the rehearsal-based continual learning methods [32, 2, 1, 9, 10] populate the replay buffer to preserve the knowledge on the previous tasks [accessing from samples previously stored in the memory a set of previous samples for training the machine learning model]. However, we argue that some instances may be non-informative and inappropriate to construct the replay buffer under realistic setups (such as video streaming or imbalanced continual learning scenarios), leading to the degradation of the model’s performance. Moreover, it is critical to select the valuable samples for current task training since the model can easily overfit to the biased and noisy data stream, which negatively affects the model generalization”; Yoon, section 5.1, “Datasets,” “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST [performing the classification task] using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29].”
“receiving a sample from the stream of samples”: Yoon, section 4.1, paragraph 1, “The model receives a data continuum during training [receiving a sample from the stream of samples], including noisy or redundant data instances in real-world scenarios. Consequently, the arriving data instances can interrupt and hurt the performance of the model.”
“defining a set of combined samples that includes the sample received from the stream of samples and the set of previous samples accessed from the memory”: Yoon, section 4, paragraph 1, “In this section, we introduce our selection strategies and propose Online Coreset Selection (OCS) to strengthen current task adaptation and mitigate catastrophic forgetting. Thus far, the rehearsal-based continual learning methods [32, 2, 1, 9, 10] populate the replay buffer to preserve the knowledge on the previous tasks [defining a set of combined samples that includes the sample received from the stream of samples and the set of previous samples accessed from the memory]. However, we argue that some instances may be non-informative and inappropriate to construct the replay buffer under realistic setups (such as video streaming or imbalanced continual learning scenarios), leading to the degradation of the model’s performance. Moreover, it is critical to select the valuable samples for current task training since the model can easily overfit to the biased and noisy data stream, which negatively affects the model generalization.”
“training the machine learning model using the set of combined samples on the classification task”: Yoon, section 4, paragraph 1, “In this section, we introduce our selection strategies and propose Online Coreset Selection (OCS) to strengthen current task adaptation and mitigate catastrophic forgetting. Thus far, the rehearsal-based continual learning methods [32, 2, 1, 9, 10] populate the replay buffer to preserve the knowledge on the previous tasks. However, we argue that some instances may be non-informative and inappropriate to construct the replay buffer under realistic setups (such as video streaming or imbalanced continual learning scenarios), leading to the degradation of the model’s performance. Moreover, it is critical to select the valuable samples for current task training [training the machine learning model using the set of combined samples] since the model can easily overfit to the biased and noisy data stream, which negatively affects the model generalization”; Yoon, section 5.1, “Datasets,” “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST [the classification task] using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29].”
(bold only) “wherein the machine learning model comprises an encoder that during said training embeds samples into an embedding space of embedded samples, wherein the samples that are embedded by the encoder to provide the embedded samples comprise (i) samples previously stored in the memory, and (ii) the sample received from the stream of samples”: Yoon, section 4, paragraph 1, “In this section, we introduce our selection strategies and propose Online Coreset Selection (OCS) to strengthen current task adaptation and mitigate catastrophic forgetting. Thus far, the rehearsal-based continual learning methods [32, 2, 1, 9, 10] populate the replay buffer to preserve the knowledge on the previous tasks [comprise (i) samples previously stored in the memory, and (ii) the sample received from the stream of samples]. However, we argue that some instances may be non-informative and inappropriate to construct the replay buffer under realistic setups (such as video streaming or imbalanced continual learning scenarios), leading to the degradation of the model’s performance. Moreover, it is critical to select the valuable samples for current task training since the model can easily overfit to the biased and noisy data stream, which negatively affects the model generalization.”
(bold only) “determining whether to store or not store the sample received from the stream of samples in the memory based on pairwise distances between the embedded samples in the embedding space”: Yoon, section 4, paragraph 1, “In this section, we introduce our selection strategies and propose Online Coreset Selection (OCS) [determining whether to store or not store the sample received from the stream of samples in the memory] to strengthen current task adaptation and mitigate catastrophic forgetting. Thus far, the rehearsal-based continual learning methods [32, 2, 1, 9, 10] populate the replay buffer to preserve the knowledge on the previous tasks”; Yoon, section 4, paragraph 2, “In particular, minibatch similarity considers a minibatch as an approximation of the target dataset and compares the minibatch-level similarity between the gradient vector of a data point b and its minibatch B. It aligns with Assumption 1 and measures how well a given data instance describes the target task at each training step. Note that selecting examples with the largest minibatch similarity is reasonable when the variance of task instances is low; otherwise, it increases the redundancy among coreset items. In contrast, cross-batch diversity compares the diversity of each data point as the negative averaged similarity with other peer instances in the same minibatch [based on pairwise distances between the … samples].”
“and storing in the memory, the sample received from the stream of samples when said determining determines to store the sample received from the stream of samples”: Yoon, section 3, paragraph 1, “The naive CL design cannot retain the knowledge of previous tasks and thus results in catastrophic forgetting. To tackle this problem, rehearsal-based methods [31, 9, 41] update the model on a randomly sampled replay buffer Ck constructed from the previously observed tasks [storing in the memory, the sample received from the stream of samples when said determining determines to store the sample received from the stream of samples].”
“wherein at least one of the set of previous samples accessed from the memory was determined by said determining to be stored in memory“: Yoon, section 1, paragraph 2, “Recent rehearsal-based continual learning methods adapt the continual model to the previous tasks by maintaining and revisiting a small replay buffer [9, 41, 28] [wherein at least one of the set of previous samples accessed from the memory was determined by said determining to be stored in memory].”
“wherein the classification task comprises a computer vision task, autonomous movement task, search engine optimization task, and/or natural language processing task that includes a classification”: Yoon, section 5.1, “Datasets,” “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST [MNIST is the task of recognizing hand-drawn digits from images, hence a computer vision task … that includes a classification] using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29].”
Yoon does not explicitly teach:
“the method being implemented by a processor in communication with a memory”
“wherein said determining comprises one of: selecting a subset of samples from the embedded samples that maximizes a sum of distances across pairs of samples; or selecting a subset of the samples from the embedded samples that minimizes a sum of distances across pairs of samples from different classes”
(bold only) “wherein the machine learning model comprises an encoder that during said training embeds samples into an embedding space of embedded samples, wherein the samples that are embedded by the encoder to provide the embedded samples comprise (i) samples previously stored in the memory, and (ii) the sample received from the stream of samples”
(bold only) “determining whether to store or not store the sample received from the stream of samples in the memory based on pairwise distances between the embedded samples in the embedding space”
Gonzalez teaches:
“the method being implemented by a processor in communication with a memory”: Gonzalez, section V, paragraph 2, “All training and evaluation was performed on a single Nvidia Tesla P100 GPU with 16GB of high bandwidth memory (HBM) [a processor in communication with a memory] and NVLink (when using more than one P100). The host for the P100s contained a Xeon E5-2650 v4 and 128GB of RAM.”
“wherein said determining comprises one of: selecting a subset of samples from the embedded samples that maximizes a sum of distances across pairs of samples; or selecting a subset of the samples from the embedded samples that minimizes a sum of distances across pairs of samples from different classes”: Gonzalez, section IV. C, paragraph 1, “Given a set of embeddings X, we would like to find a subset Y ⊂ X  of size |Y| = k, where each element in Y is maximally distant from every other element in X. That is, Y is the set that maximizes:

    PNG
    media_image2.png
    55
    338
    media_image2.png
    Greyscale

Where yi is the ith element in Y dist is a distance metric (e.g., Euclidean distance or cosine distance).”
(bold only) “wherein the machine learning model comprises an encoder that during said training embeds samples into an embedding space of embedded samples, wherein the samples that are embedded by the encoder to provide the embedded samples comprise (i) samples previously stored in the memory, and (ii) the sample received from the stream of samples” and (bold only) “determining whether to store or not store the sample received from the stream of samples in the memory based on pairwise distances between the embedded samples in the embedding space”: Gonzalez, section IV. B, “Our technique is flexible with regards to the specific autoencoder design that is chosen. In this paper, we have built a standard hourglass-shaped autoencoder inspired by common designs in the literature. Figure 3 shows our selected autoencoder design in detail. The encoder portion is composed of 3x3 max-pooling layers, batch normalization layers, and basic 5x5 convolutions with ReLU activation functions. The encoder serves to calculate a 216-element embedding for a given input [the machine learning model comprises an encoder that during said training embeds samples into an embedding space of embedded samples].”
Gonzalez and Yoon are analogous arts as they are both related to sample-based training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the sample encoding and selection of Gonzalez with the teachings of Yoon to arrive at the present invention, in order to improve performance, as stated in Gonzalez, section II, paragraph 4, “Instead, we believe that reducing the training set size during the initial part of training will solve these problems with fewer compromises than existing solutions. By reducing the training set size by an order of magnitude, almost all existing datasets can fit entirely in accelerator memory. This requires the data to only be transferred to the accelerator once, eliminating practically all bottlenecks so that the accelerator can proceed at full speed with minimal cost to the rest of the system. Since only fine-tuning must be performed with the full dataset, much of the training time can be accelerated, resulting in significant time savings.”

Regarding claim 3:
	Yoon as modified by Gonzalez teaches the method of claim 2.
	Yoon further teaches “further comprising when said determining determines to store the sample received from the stream of samples: identifying a sample from the samples stored in the memory; and replacing the identified sample stored in the memory with the sample received from the stream of samples”: Yoon, section 4.1, paragraphs 1-3, “The model receives a data continuum during training, including noisy or redundant data instances in real-world scenarios. Consequently, the arriving data instances can interrupt and hurt the performance of the model. To tackle this problem, we consider an amalgamation of minibatch similarity and cross-batch diversity to select the most helpful instances for current task training. … We consider a selected coreset at each iteration as a candidate for the replay buffer [hence, the replay buffer at each step is a selection from previous samples and the most recently arriving data, thus identifying a sample from the set of previous samples stored in the memory; and replacing the identified sample in the memory with the sample received from the stream of samples]. After the completion of each task training, we choose a coreset Ct among the collected candidates, or we may also iteratively update Ct for continual learning as described in the following subsection.”

Regarding claim 4:
	Yoon as modified by Gonzalez teaches the method of claim 3.
	Yoon further teaches “wherein the sample received from the stream of samples replaces the identified sample based on said determining”: Yoon, section 4.1, paragraphs 1-3, “The model receives a data continuum during training, including noisy or redundant data instances in real-world scenarios. Consequently, the arriving data instances can interrupt and hurt the performance of the model. To tackle this problem, we consider an amalgamation of minibatch similarity and cross-batch diversity to select the most helpful instances for current task training. … We consider a selected coreset at each iteration as a candidate for the replay buffer [hence, the replay buffer at each step is a selection from previous samples and the most recently arriving data, thus wherein the sample received from the stream of samples replaces the identified sample based on said determining]. After the completion of each task training, we choose a coreset Ct among the collected candidates, or we may also iteratively update Ct for continual learning as described in the following subsection.”

Regarding claim 5:
Yoon as modified by Gonzalez teaches the method of claim 4.
Yoon further teaches “wherein said determining comprises: determining to store the sample received from the stream of samples in the memory if the sample received from the stream of samples is in the selected subset of samples”: Yoon, Algorithm 1, 

    PNG
    media_image3.png
    316
    791
    media_image3.png
    Greyscale

[showing that the selected coreset is incorporated into the replay buffer, hence, wo memory regions, BC and CT, (a first memory region and a second memory region) wherein storing in BC is based on a preliminary determination, separately a decision is made whether to store in CT, hence, determining to store the sample received from the stream of samples in the memory if the sample received from the stream of samples is in the selected subset of samples].“

Regarding claim 6:
Yoon as modified by Gonzalez teaches the method of claim 5.
Yoon further teaches “wherein the identified sample that is replaced is not in the selected subset of samples”: Yoon, section 4.1, paragraphs 1-3, “The model receives a data continuum during training, including noisy or redundant data instances in real-world scenarios. Consequently, the arriving data instances can interrupt and hurt the performance of the model. To tackle this problem, we consider an amalgamation of minibatch similarity and cross-batch diversity to select the most helpful instances for current task training. … We consider a selected coreset at each iteration as a candidate for the replay buffer [hence, the replay buffer at each step is a selection from previous samples and the most recently arriving data, thus wherein the identified sample that is replaced is not in the selected subset of samples]. After the completion of each task training, we choose a coreset Ct among the collected candidates, or we may also iteratively update Ct for continual learning as described in the following subsection.”

Regarding claim 7:
Yoon as modified by Gonzalez teaches the method of claim 6.
Yoon further teaches (bold only) “wherein the selected subset of samples maximizes heterogeneity between the combined samples in the embedding space by selecting samples to be stored in the memory such that the points are maximally spread in the embedding space”: Yoon, section 4, paragraph 2, “In particular, minibatch similarity considers a minibatch as an approximation of the target dataset and compares the minibatch-level similarity between the gradient vector of a data point b and its minibatch B. It aligns with Assumption 1 and measures how well a given data instance describes the target task at each training step. Note that selecting examples with the largest minibatch similarity is reasonable when the variance of task instances is low; otherwise, it increases the redundancy among coreset items. In contrast, cross-batch diversity compares the diversity of each data point as the negative averaged similarity with other peer instances in the same minibatch”; Yoon, section 4.1, “To tackle this problem, we consider an amalgamation of minibatch similarity and cross-batch diversity to select the most helpful instances for current task training. More formally, our online coreset selection for the current task adaptation can be defined as follows: 
    PNG
    media_image4.png
    45
    707
    media_image4.png
    Greyscale
[ selecting samples to be stored in the memory such that the points are maximally spread].”
Gonzalez further teaches (bold only) “wherein the selected subset of samples maximizes heterogeneity between the combined samples in the embedding space by selecting samples to be stored in the memory such that the points are maximally spread in the embedding space”: Gonzalez, section IV. B, “Our technique is flexible with regards to the specific autoencoder design that is chosen. In this paper, we have built a standard hourglass-shaped autoencoder inspired by common designs in the literature. Figure 3 shows our selected autoencoder design in detail. The encoder portion is composed of 3x3 max-pooling layers, batch normalization layers, and basic 5x5 convolutions with ReLU activation functions. The encoder serves to calculate a 216-element embedding for a given input [in the embedding space].”
Gonzalez and Yoon are combinable for the rationale given under claim 1.

Regarding claim 10:
	Yoon as modified by Gonzalez teaches the method of claim 1.
Yoon further teaches “wherein said selecting a subset of samples selects samples that are closer to class boundaries of the classification task than other samples from the embedded samples”: Yoon, section 5.4, paragraph 1, “We observe that uniform sampling selects highly biased samples representing the dominant classes for imbalanced CL and noisy instances for noisy CL. In contrast, iCaRL selects the representative samples per class for imbalanced CL; however, it selects noisy instances during noisy CL. In comparison, OCS selects the beneficial examples for each class during imbalanced CL and discards uninformative noisy instances in the noisy CL training regime [wherein said selecting a subset of samples selects samples that are closer to class boundaries of the classification task than other samples from the embedded samples, interpreted as using a method that helps distinguish classes during classification].”

Regarding claim 11:
	Yoon as modified by Gonzalez teaches the method of claim 1.
Yoon further teaches “wherein the stream of samples comprises one or more of images or features”: Yoon, section 5.1, “Datasets,” “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST [MNIST is the task of recognizing hand-drawn digits from images, hence the stream of samples comprises one or more of images or features] using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29].”

Regarding claim 12:
	Yoon as modified by Gonzalez teaches the method of claim 1.
	Gonzalez further teaches “wherein the machine learning model comprises a deep neural network model”: Gonzalez, Fig. 2, 

    PNG
    media_image5.png
    213
    389
    media_image5.png
    Greyscale

[showing a deep neural network model]; Gonzalez, section IV. A, paragraph 1, “Figure 2 illustrates the CIFAR-10 image classifier used in this paper.”
	Gonzalez and Yoon are combinable for the rationale given under claim 1.

Regarding claim 13:
	Yoon as modified by Gonzalez teaches the method of claim 1.
Gonzalez further teaches “wherein said training comprises learning according to a self-supervised learning objective”: Gonzalez, section IV. B, “Our technique is flexible with regards to the specific autoencoder design that is chosen. In this paper, we have built a standard hourglass-shaped autoencoder [learning according to a self-supervised learning objective] inspired by common designs in the literature. Figure 3 shows our selected autoencoder design in detail. The encoder portion is composed of 3x3 max-pooling layers, batch normalization layers, and basic 5x5 convolutions with ReLU activation functions. The encoder serves to calculate a 216-element embedding for a given input.”
Gonzalez and Yoon are combinable for the rationale given under claim 1.

Regarding claim 16:
	Yoon as modified by Gonzalez teaches the method of claim 1.
	Yoon further teaches “wherein the stream of samples is a stream of continuous samples”: Yoon, section 4.1, paragraph 1, “The model receives a data continuum during training, including noisy or redundant data instances in real-world scenarios [wherein the stream of samples is a stream of continuous samples].”

Regarding claim 17:
	Yoon as modified by Gonzalez teaches the method of claim 16.
	Yoon further teaches “wherein each sample in the stream of samples comprises a class”: Yoon, section 5.1, paragraph 1, “We validate OCS on class-incremental CL [wherein each sample in the stream of samples comprises a class] for Balanced and Imbalanced Rotated MNIST using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29].”

Regarding claim 19:
	Yoon as modified by Gonzalez teaches the method of claim 1.
Yoon further teaches “A method for classifying a data input, the method comprising: receiving the data input by a machine learning model trained according to claim 1; processing, using the machine learning model, the received data input to determine a classification; and outputting the classification”: Yoon, section 5.1, paragraph 1, “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29]. We perform five independent runs for all the experiments and provide further details on the experimental settings and datasets in Appendix A [processing, using the machine learning model, the received data input to determine a classification; and outputting the classification, outputting interpreted as output by the model].”

Regarding claim 20:
	Yoon as modified by Gonzalez teaches the method of claim 1.
	Yoon further teaches (bold only) “partitioning the memory into a first memory region and a second memory region; storing or not storing the new sample in the first memory region based on a preliminary determination; herein said determining determines whether to store or not to store the sample received from the stream of samples in the second memory region based on distances between embedded samples in the embedding space when the sample received from the stream of samples is not stored in the first memory region”: Yoon, Algorithm 1, 

    PNG
    media_image3.png
    316
    791
    media_image3.png
    Greyscale

[showing two memory regions, BC and CT, (a first memory region and a second memory region) wherein storing in BC is based on a preliminary determination, separately a decision is made whether to store in CT, hence, when the sample received from the stream of samples is not stored in the first memory region]“; Yoon, section 4, paragraph 2, “In particular, minibatch similarity considers a minibatch as an approximation of the target dataset and compares the minibatch-level similarity between the gradient vector of a data point b and its minibatch B. It aligns with Assumption 1 and measures how well a given data instance describes the target task at each training step. Note that selecting examples with the largest minibatch similarity is reasonable when the variance of task instances is low; otherwise, it increases the redundancy among coreset items. In contrast, cross-batch diversity compares the diversity of each data point as the negative averaged similarity with other peer instances in the same minibatch [hence, storing in CT is based on distances between … samples].”
Gonzalez further teaches (bold only) “partitioning the memory into a first memory region and a second memory region; storing or not storing the new sample in the first memory region based on a preliminary determination; herein said determining determines whether to store or not to store the sample received from the stream of samples in the second memory region based on distances between embedded samples in the embedding space when the sample received from the stream of samples is not stored in the first memory region”: Gonzalez, section IV. B, “Our technique is flexible with regards to the specific autoencoder design that is chosen. In this paper, we have built a standard hourglass-shaped autoencoder inspired by common designs in the literature. Figure 3 shows our selected autoencoder design in detail. The encoder portion is composed of 3x3 max-pooling layers, batch normalization layers, and basic 5x5 convolutions with ReLU activation functions. The encoder serves to calculate a 216-element embedding for a given input [in the embedding space].”
Gonzalez and Yoon are combinable for the rationale given under claim 1.

Regarding claim 21:
	Yoon as modified by Gonzalez teaches the method of claim 20.
	Yoon further teaches (bold only) “wherein the preliminary determination is a determination other than a determination based on distances between the embedded samples in the embedding space”: Yoon, Algorithm 1, 

    PNG
    media_image3.png
    316
    791
    media_image3.png
    Greyscale

[showing the decision to store in BC is made randomly, hence, other than a determination based on distances between the … samples].”
Gonzalez further teaches (bold only) “wherein the preliminary determination is a determination other than a determination based on distances between the embedded samples in the embedding space”: Gonzalez, section IV. B, “Our technique is flexible with regards to the specific autoencoder design that is chosen. In this paper, we have built a standard hourglass-shaped autoencoder inspired by common designs in the literature. Figure 3 shows our selected autoencoder design in detail. The encoder portion is composed of 3x3 max-pooling layers, batch normalization layers, and basic 5x5 convolutions with ReLU activation functions. The encoder serves to calculate a 216-element embedding for a given input [in the embedding space].”
Gonzalez and Yoon are combinable for the rationale given under claim 1.

Regarding claim 22:
	Yoon as modified by Gonzalez teaches the method of claim 20.
	Yoon further teaches “wherein the preliminary determination is based on one of a random sampling method and a reservoir sampling method”: Yoon, Algorithm 1, 

    PNG
    media_image3.png
    316
    791
    media_image3.png
    Greyscale

[showing the decision to store in BC is made using random sampling].”

Regarding claim 23:
	Yoon as modified by Gonzalez teaches the method of claim 1.
	Yoon further teaches (bold only) “partitioning the memory with the set of previous samples into a first memory region and a second memory region; wherein said determining determines whether to store or not store the sample received from the stream of samples in the first memory region based on distances between embedded samples in the embedding space; and storing or not storing the sample received from the stream of samples in the second memory region based on a subsequent determination when the sample received from the stream of samples is not stored in the first memory region”: Yoon, Algorithm 1, 

    PNG
    media_image3.png
    316
    791
    media_image3.png
    Greyscale

[showing two memory regions, BC and CT, (a first memory region and a second memory region) wherein storing in BC is separate from the decision to store in CT, hence, based on a subsequent determination when the sample received from the stream of samples is not stored in the first memory region]“; Yoon, section 4, paragraph 2, “In particular, minibatch similarity considers a minibatch as an approximation of the target dataset and compares the minibatch-level similarity between the gradient vector of a data point b and its minibatch B. It aligns with Assumption 1 and measures how well a given data instance describes the target task at each training step. Note that selecting examples with the largest minibatch similarity is reasonable when the variance of task instances is low; otherwise, it increases the redundancy among coreset items. In contrast, cross-batch diversity compares the diversity of each data point as the negative averaged similarity with other peer instances in the same minibatch [hence, storing in CT is based on distances between … samples].”
Gonzalez further teaches (bold only) “wherein said determining determines whether to store or not store the sample received from the stream of samples in the first memory region based on distances between embedded samples in the embedding space; and storing or not storing the sample received from the stream of samples in the second memory region based on a subsequent determination when the sample received from the stream of samples is not stored in the first memory region”: Gonzalez, section IV. B, “Our technique is flexible with regards to the specific autoencoder design that is chosen. In this paper, we have built a standard hourglass-shaped autoencoder inspired by common designs in the literature. Figure 3 shows our selected autoencoder design in detail. The encoder portion is composed of 3x3 max-pooling layers, batch normalization layers, and basic 5x5 convolutions with ReLU activation functions. The encoder serves to calculate a 216-element embedding for a given input [in the embedding space].”
Gonzalez and Yoon are combinable for the rationale given under claim 1.

Regarding claim 24:
	Yoon teaches:
“An online learning system comprising”: Yoon, Abstract, “To tackle this problem, we propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration and trains them in an online manner [online learning system].”
“a memory configured to store a plurality of samples”: Yoon, Algorithm 1, “input Dataset {Dt}Tt=1, neural network fΘ,  learning rate n, hyperparameters λ,  τ, replay buffer C <- {} [a memory configured to store a plurality of samples].”
“and a machine learning model for performing a classification task comprising a computer vision task, autonomous movement classification task, search engine optimization task, and/or natural language processing task that includes a classification”: Yoon, section 5.1, “Datasets,” “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST [MNIST is the task of recognizing hand-drawn digits from images, hence a classification task that is a computer vision task] using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29].”
“wherein the processor is configured to: receive a new sample”: Yoon, section 4.1, paragraph 1, “The model receives a data continuum during training [receive a new sample], including noisy or redundant data instances in real-world scenarios. Consequently, the arriving data instances can interrupt and hurt the performance of the model.”
“train the machine learning model on the classification task using combined samples including the new sample and a set of previous samples from the stored plurality of samples”: Yoon, section 4, paragraph 1, “In this section, we introduce our selection strategies and propose Online Coreset Selection (OCS) to strengthen current task adaptation and mitigate catastrophic forgetting. Thus far, the rehearsal-based continual learning methods [32, 2, 1, 9, 10] populate the replay buffer to preserve the knowledge on the previous tasks. However, we argue that some instances may be non-informative and inappropriate to construct the replay buffer under realistic setups (such as video streaming or imbalanced continual learning scenarios), leading to the degradation of the model’s performance. Moreover, it is critical to select the valuable samples for current task training [train the machine learning model … using combined samples] since the model can easily overfit to the biased and noisy data stream, which negatively affects the model generalization”; Yoon, section 5.1, “Datasets,” “We validate OCS on class-incremental CL for Balanced and Imbalanced Rotated MNIST [the classification task] using a single-head two-layer MLP with 256 ReLU units in each layer, task-incremental CL for Split CIFAR-100 and Multiple Datasets (a sequence of five datasets) with a multi-head structured ResNet-18 following prior works [9, 28, 29].”
“determining whether to store the new sample in the memory when the new sample is within the selected set of samples”: Yoon, Algorithm 1, 

    PNG
    media_image3.png
    316
    791
    media_image3.png
    Greyscale

[showing that (step 10) the selected samples are stored in the replay buffer].
“wherein at least one of the set of previous samples accessed from the memory was determined by said determining to be stored in memory”: Yoon, Algorithm 1, 

    PNG
    media_image3.png
    316
    791
    media_image3.png
    Greyscale

[showing that the process Is iterative, and thus, the previous samples in the memory were determine by previous iterations of the same process].
Yoon does not explicitly teach:
“the machine learning model being implemented by a processor in communication with the memory”
“the machine learning model comprising an encoder trained to embed the plurality of samples in the memory in an embedding space of embedded samples”
“wherein the encoder during training further embeds the new sample in the embedding space”
 “and store or not store the new sample in the memory based on pairwise distances between embedded samples in the embedding space learned by the machine learning model”
“wherein said storing comprises: selecting a set of samples from the embedded samples that either maximizes a sum of distances across pairs of samples or that minimizes a sum of distances across pairs of samples from different classes”
Gonzalez teaches:
“the machine learning model being implemented by a processor in communication with the memory”: Gonzalez, section V, paragraph 2, “All training and evaluation was performed on a single Nvidia Tesla P100 GPU with 16GB of high bandwidth memory (HBM) [a processor in communication with the memory] and NVLink (when using more than one P100). The host for the P100s contained a Xeon E5-2650 v4 and 128GB of RAM.”
“the machine learning model comprising an encoder trained to embed the plurality of samples in the memory in an embedding space of embedded samples,” “wherein the encoder during training further embeds the new sample in the embedding space,” (bold only) “and store or not store the new sample in the memory based on pairwise distances between embedded samples in the embedding space learned by the machine learning model,” and (bold only) “wherein said storing comprises: selecting a set of samples from the embedded samples that either maximizes a sum of distances across pairs of samples or that minimizes a sum of distances across pairs of samples from different classes”: Gonzalez, section IV. B, “Our technique is flexible with regards to the specific autoencoder design that is chosen. In this paper, we have built a standard hourglass-shaped autoencoder inspired by common designs in the literature. Figure 3 shows our selected autoencoder design in detail. The encoder portion is composed of 3x3 max-pooling layers, batch normalization layers, and basic 5x5 convolutions with ReLU activation functions. The encoder serves to calculate a 216-element embedding for a given input [the machine learning model comprising an encoder trained to embed the plurality of samples in the memory in an embedding space of embedded samples][wherein the encoder during training further embeds the new sample in the embedding space].”
“and store or not store the new sample in the memory based on pairwise distances between embedded samples in the embedding space learned by the machine learning model,” and “wherein said storing comprises: selecting a set of samples from the embedded samples that either maximizes a sum of distances across pairs of samples or that minimizes a sum of distances across pairs of samples from different classes”: Gonzalez, section IV. C, paragraph 1, “Given a set of embeddings X, we would like to find a subset Y ⊂ X  of size |Y| = k, where each element in Y is maximally distant from every other element in X [based on pairwise distances between embedded samples in the embedding space learned by the machine learning model][selecting a set of samples from the embedded samples that either maximizes a sum of distances across pairs of samples or that minimizes a sum of distances across pairs of samples from different classes]. That is, Y is the set that maximizes:

    PNG
    media_image2.png
    55
    338
    media_image2.png
    Greyscale

Where yi is the ith element in Y dist is a distance metric (e.g., Euclidean distance or cosine distance).”
Gonzalez and Yoon are analogous arts as they are both related to sample-based training. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the sample encoding and selection of Gonzalez with the teachings of Yoon to arrive at the present invention, in order to improve performance, as stated in Gonzalez, section II, paragraph 4, “Instead, we believe that reducing the traini
Read full office action
Prosecution Timeline

Sep 03, 2021
Application Filed
Oct 29, 2024
Non-Final Rejection — §103
Apr 14, 2025
Response Filed
May 02, 2025
Final Rejection — §103
Sep 15, 2025
Request for Continued Examination
Oct 01, 2025
Response after Non-Final Action
Nov 10, 2025
Non-Final Rejection — §103
Mar 16, 2026
Interview Requested
Mar 25, 2026
Examiner Interview Summary
Mar 25, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

17/163,383
Patent 12591634
COMPOSITE EMBEDDING SYSTEMS AND METHODS FOR MULTI-LEVEL GRANULARITY SIMILARITY RELEVANCE SCORING
2y 5m to grant Granted Mar 31, 2026
17/249,028
Patent 12591796
INTELLIGENT DISTANCE PROMPTING
2y 5m to grant Granted Mar 31, 2026
17/353,931
Patent 12572620
RELIABLE INFERENCE OF A MACHINE LEARNING MODEL
2y 5m to grant Granted Mar 10, 2026
17/495,214
Patent 12566974
Method, System, and Computer Program Product for Knowledge Graph Based Embedding, Explainability, and/or Multi-Task Learning
2y 5m to grant Granted Mar 03, 2026
17/317,052
Patent 12547616
SEMANTIC REASONING FOR TABULAR QUESTION ANSWERING
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
59%
Grant Probability
94%
With Interview (+34.7%)
4y 6m
Median Time to Grant
High
PTA Risk
Based on 34 resolved cases by this examiner. Grant probability derived from career allow rate.