Last updated: May 04, 2026

Application No. 18/490,252

REDUCING UTILIZATION OF COMPUTATIONAL RESOURCES ASSOCIATED WITH SEGMENTING DATASETS VIA A CLUSTER- ENSEMBLE MODEL SYSTEMS AND METHODS

Final Rejection §103

Filed

Oct 19, 2023

Examiner

PARK, GRACE A

Art Unit

2144

Tech Center

2100 — Computer Architecture & Software

Assignee

Capital One Services LLC

OA Round

2 (Final)

Interview Optional

— +18.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 76% grant rate with +18.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 557 resolved cases, 2023–2026

Examiner Intelligence

PARK, GRACE A View full profile →

Grants 76% — above average

Career Allowance Rate

421 granted / 557 resolved

+20.6% vs TC avg

Strong +18% interview lift

Without

With

+18.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

26 currently pending

Career history

583

Total Applications

across all art units

Statute-Specific Performance

§101

11.0%

-29.0% vs TC avg

§103

53.9%

+13.9% vs TC avg

§102

17.0%

-23.0% vs TC avg

§112

10.4%

-29.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 557 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment and Arguments
Applicant’s amendment filed on December 19, 2025 has been entered and made of record.  Claims 1-20 are pending and are being examined in this application.
Applicant’s arguments with respect to the 103 rejections have been considered, but are moot in view of the new ground(s) of rejection provided below.

Allowable Subject Matter
Claims 3-5, 10-12, 16-18, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 6, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bhatia et al. (US Pub. 20200104697) in view of Schlerf et al. (US Pub. 20240089336) and further in view of Shyr et al. (US Pub. 20150286704).

Referring to claim 1, Bhatia discloses A system for reducing utilization of computational resources associated with segmenting datasets via a cluster-ensemble model, the system comprising: one or more processors executing computer program instructions that, when executed, cause operations [fig. 8, computing device 800 comprises processor 802, memory 804, and storage 806] comprising: 
receiving a raw dataset having a first dimension comprising (i) user identifiers of users, (ii) entity identifiers associated with entities that the users interacted with, and (iii) timestamps at which the users interacted with the entities [pars. 20-22; (unorganized) user interaction data includes user identifiers, content item identifiers, and interaction timestamps]; 
embedding the raw dataset into an embedded dataset having a second dimension that is less than that of the first dimension, wherein the embedded dataset comprises vector embeddings of (i) the user identifiers, (ii) the entity identifiers associated with the entities, and (iii) the timestamps at which the users interacted with the entities [pars. 20-22; the user interaction data is transformed into vectorized user embeddings providing standardized user representations of each user’s interactive behavior with respect to content items (i.e., the user identifiers, content item identifiers, and the interaction timestamps)]; 
providing the embedded dataset to each of (i) a first clustering model to generate a first set of clusters...between the vector embeddings of a first subset of the embedded dataset and (ii) a second clustering model to generate a second set of clusters...between the vector embeddings of a second subset of the embedded dataset [pars. 29, 63, 126, 141, and 149-152; the user embeddings are provided to one or more machine learning models (e.g., cluster models)], wherein the first clustering model and the second clustering model are...trained [pars. 29, 63, 126, and 141; the user embeddings are provided to the one or more machine learning models (e.g., cluster models) as the basis for making predictions (i.e., as training data)] on the first subset of the embedded dataset and the second subset of the embedded dataset... [pars. 21 and 149-152; the user interaction data is partitioned into groups (i.e., subsets) such that the user embeddings are implemented as a first matrix and a second matrix]; and
generating...a set of labeled data segments...indicating at least one characteristic [pars. 126 and 141; the user embeddings are provided to the machine learning models to perform various use cases like clustering segmentation].

Bhatia does not appear to explicitly disclose generate a first set of clusters based on determined Euclidean distances between the vector embeddings of a first subset of the embedded dataset; generate a second set of clusters based on determined Euclidean distances between the vector embeddings of a second subset of the embedded dataset; that the first clustering model and the second clustering model are respectively trained on the first subset and the second subset; providing the first set of clusters and the second set of clusters to the cluster-ensemble model comprising an ensemble function to generate a set of ensemble-clusters; and generating, based on the set of ensemble-clusters, a set of labeled data segments corresponding to the set of ensemble-clusters indicating at least one characteristic of a respective ensemble-cluster of the set of ensemble-clusters.
However, Schlerf discloses generate a first set of clusters based on the vector embeddings of a first subset of the embedded dataset; generate a second set of clusters the vector embeddings of a second subset of the embedded dataset; and that the first clustering model and the second clustering model are respectively trained on the first subset and the second subset [pars. 32, 41, and 42; two or more cluster models are respectively trained based on particular subsets of feature vectors corresponding to different characteristics; this means that clustering performed by the cluster models would be based on the particular subsets of feature vectors].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the machine learning models (e.g., cluster models) taught by Bhatia so that they are respectively trained based on subsets of feature vectors as taught by Schlerf, with a reasonable expectation of success. The motivation for doing so would have been to train various cluster models corresponding to different characteristics [Schlerf, par. 42].
Bhatia and Schlerf do not appear to explicitly disclose that the first set of clusters and the second set of clusters are generated based on determined Euclidean distances; providing the first set of clusters and the second set of clusters to the cluster-ensemble model comprising an ensemble function to generate a set of ensemble-clusters; and generating, based on the set of ensemble-clusters, a set of labeled data segments corresponding to the set of ensemble-clusters indicating at least one characteristic of a respective ensemble-cluster of the set of ensemble-clusters.
However, Shyr discloses that the first set of clusters and the second set of clusters are generated based on determined Euclidean distances [pars. 106 and 113; distances between member cases, centroids, and/or clusters are determined based on Euclidean distances]; providing the first set of clusters and the second set of clusters [fig. 1; pars. 12, 13, and 27-29; each mapper program 115 accesses data 192, 194, 196, and 198, respectively, and generates cluster feature trees (CF-trees); each reducer program 155 (i.e., cluster model) generates a candidate set of variables corresponding to a clustering solution (i.e., a set of clusters)] to the cluster-ensemble model comprising an ensemble function to generate a set of ensemble-clusters [fig. 1; pars. 12, 13, 27-29, 40, and 41; the candidate sets of variables are provided to controller program 175 (i.e., cluster-ensemble model), which combines the candidate sets of variables using an ensemble technique to build an overall quality clustering solution (i.e., a final set of clusters); and generating, based on the set of ensemble-clusters, a set of labeled data segments corresponding to the set of ensemble-clusters indicating at least one characteristic of a respective ensemble-cluster of the set of ensemble-clusters [fig. 1; pars. 12, 13, 27-29, 40, and 41; a final set of variables (i.e., features / data segments / characteristics) corresponding to the overall quality clustering solution is selected].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the clustering taught by the combination of Bhatia and Schlerf so that an ensemble technique is used for clustering as taught by Shyr, with a reasonable expectation of success. The motivation for doing so would have been to maximize the quality of the clustering solution [Shyr, par. 12].

Referring to claim 2, see the rejection for claim 1, which incorporates the claimed method.

Referring to claim 6, Bhatia discloses The method of claim 2, wherein the second dimension is less than that of the first dimension [pars. 20-22; note the vectorized user embeddings].

Referring to claim 15, see at least the rejection for claim 1. Bhatia further discloses One or more non-transitory, computer-readable media comprising instructions that, when executed by one or more processors, cause the claimed operations [fig. 8, computing device 800 comprises processor 802, memory 804, and storage 806].

Referring to claim 19, see the rejection for claim 6.

Claims 7, 13, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Bhatia, Schlerf, and Shyr in view of Yu et al. (US Pub. 20240386325).

Referring to claim 7, Bhatia, Schlerf, and Shyr do not appear to explicitly disclose The method of claim 2, wherein embedding the raw dataset into the embedded dataset comprises providing the raw dataset to an embedding model to generate the embedded dataset having a set of rows and a set of columns, wherein each row corresponds to a user identifier associated with a user and each column corresponds to one or more timestamps at which the user interacted with an entity.
However, Yu discloses The method of claim 2, wherein embedding the raw dataset into the embedded dataset comprises providing the raw dataset to an embedding model to generate the embedded dataset having a set of rows and a set of columns, wherein each row corresponds to a user identifier associated with a user and each column corresponds to one or more timestamps at which the user interacted with an entity [pars. 34, 79, 154; users and user interaction information are ingested from source data as feature vectors, with users associated with rows and temporal data (e.g., timestamps) of user interaction with various elements associated with columns].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the user embeddings taught by the combination of Bhatia, Schlerf, and Shyr so that the vector embeddings represent users and temporal data of user interactions as taught by Yu, with a reasonable expectation of success. The motivation for doing so would have been to facilitate forward-in-time predictions associated with particular use-cases [Yu, par. 59].

Referring to claim 13, Bhatia, Schlerf, and Shyr do not appear to explicitly disclose The method of claim 2, further comprising: prior to embedding the raw dataset into the embedded dataset, filtering the raw dataset based on a threshold value, wherein the threshold value is a predetermined time range; and updating the raw dataset based on the filtering of the raw dataset.
However, Yu discloses The method of claim 2, further comprising: prior to embedding the raw dataset into the embedded dataset, filtering the raw dataset based on a threshold value, wherein the threshold value is a predetermined time range; and updating the raw dataset based on the filtering of the raw dataset [par. 248; when source data is ingested, one or more temporal filters are applied to exclude data associated with timestamps outside the scope of a particular use-case].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the user embeddings taught by the combination of Bhatia, Schlerf, and Shyr so that temporal filters are applied as taught by Yu, with a reasonable expectation of success. The motivation for doing so would have been to facilitate forward-in-time predictions associated with particular use-cases [Yu, par. 59].

Referring to claim 14, Bhatia, Schlerf, and Shyr do not appear to explicitly disclose The method of claim 2, further comprising: prior to embedding the raw dataset into the embedded dataset, filtering the raw dataset into a set of subsets of the raw dataset based on a threshold value, wherein the threshold value is a predetermined time range; and for each subset of the raw dataset, providing the entity identifiers of the respective subset of the raw dataset to an embedding model to generate the vector embeddings.
However, Yu discloses The method of claim 2, further comprising: prior to embedding the raw dataset into the embedded dataset, filtering the raw dataset into a set of subsets of the raw dataset based on a threshold value, wherein the threshold value is a predetermined time range; and for each subset of the raw dataset, providing the entity identifiers of the respective subset of the raw dataset to an embedding model to generate the vector embeddings [pars. 34, 79, 154, and 248; when source data is ingested, one or more temporal filters are applied to exclude data associated with timestamps outside the scope of a particular use-case; ingested data represents users and user interaction with various elements using feature vectors].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the user embeddings taught by the combination of Evans and Nguyen so that temporal filters are applied as taught by Yu, with a reasonable expectation of success. The motivation for doing so would have been to facilitate forward-in-time predictions associated with particular use-cases [Yu, par. 59].

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Bhatia, Schlerf, and Shyr in view of Katz et al. (20240029135).

Referring to claim 8, Bhatia, Schlerf, and Shyr do not appear to explicitly disclose The method of claim 2, wherein embedding the raw dataset into the embedded dataset comprises providing the raw dataset to an embedding model to generate the embedded dataset having a set of rows and a set of columns, wherein each row corresponds to an entity identifier and each column corresponds to one or more timestamps at which users interacted with the entity associated with the entity identifier.
However, Katz discloses The method of claim 2, wherein embedding the raw dataset into the embedded dataset comprises providing the raw dataset to an embedding model to generate the embedded dataset having a set of rows and a set of columns, wherein each row corresponds to an entity identifier and each column corresponds to one or more timestamps at which users interacted with the entity associated with the entity identifier [pars. 16, 37, and 41; each row is associated with a specific item and each column is associated with a timestamp in a vector representation used for training a model].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the user embeddings taught by the combination of Bhatia, Schlerf, and Shyr so that the user embeddings represent items and timestamps as taught by Katz, with a reasonable expectation of success. The motivation for doing so would have been to perform efficient, accurate predictions that are individualized to specific users [Katz, par. 16].

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Bhatia, Schlerf, and Shyr in view of Lackritz et al. (US Pub. 20240419985).

Referring to claim 9, Bhatia, Schlerf, and Shyr do not appear to explicitly disclose The method of claim 2, further comprising: extracting, from the raw dataset, the entity identifiers associated with the entities that the users have interacted with; comparing each entity identifier of the set of entity identifiers to a predetermined set of entity identifiers to determine a match; and in response to determining the match between a respective entity identifier of the entity identifiers and a respective entity identifier of the predetermined set of entity identifiers, removing information from the raw dataset that corresponds to the respective entity identifier, wherein the match is determined
However, Lackritz discloses The method of claim 2, further comprising: extracting, from the raw dataset, the entity identifiers associated with the entities that the users have interacted with; comparing each entity identifier of the entities to a predetermined set of entity identifiers to determine a match; and in response to determining the match between a respective entity identifier of the set of entity identifiers and a respective entity identifier of the predetermined set of entity identifiers, removing information from the raw dataset that corresponds to the respective entity identifier, wherein the match is determined [par. 62; if text similarity between entities is high (i.e., matching), the same entity is likely represented and deduplication is executed to remove redundant entities].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the user embeddings taught by the combination of Bhatia, Schlerf, and Shyr so to include entity deduplication as taught by Lackritz, with a reasonable expectation of success. The motivation for doing so would have been to prevent the same entity from being represented more than once in the embeddings [Lackritz, par. 64] (i.e., so results aren’t skewed).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRACE PARK whose telephone number is (571)270-7727. The examiner can normally be reached M-F 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Grace Park/Primary Examiner, Art Unit 2144

Read full office action

Prosecution Timeline

Oct 19, 2023

Application Filed

Sep 18, 2025

Non-Final Rejection — §103

Dec 18, 2025

Applicant Interview (Telephonic)

Dec 18, 2025

Examiner Interview Summary

Dec 19, 2025

Response Filed

Mar 05, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/966,892

Patent 12608650

STORAGE MEDIUM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING APPARATUS

3y 6m to grant Granted Apr 21, 2026

18/149,682

Patent 12591807

SKETCHED AND CLUSTERED FEDERATED LEARNING WITH AUTOMATIC TUNING

3y 2m to grant Granted Mar 31, 2026

17/452,519

Patent 12585924

CAUSAL MULTI-TOUCH ATTRIBUTION

4y 4m to grant Granted Mar 24, 2026

17/726,675

Patent 12585728

METHOD AND APPARATUS FOR MACHINE LEARNING BASED INLET DEBRIS MONITORING

3y 11m to grant Granted Mar 24, 2026

17/721,873

Patent 12579150

Hybrid and Hierarchical Multi-Trial and OneShot Neural Architecture Search on Datacenter Machine Learning Accelerators

3y 11m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

76%

Grant Probability

94%

With Interview (+18.2%)

3y 4m (~9m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 557 resolved cases by this examiner. Grant probability derived from career allowance rate.