Last updated: May 29, 2026

Application No. 18/060,749

SYSTEMS AND METHODS FOR BAGGING ENSEMBLE CLASSIFIERS FOR IMBALANCED BIG DATA

Final Rejection §103

Filed

Dec 01, 2022

Examiner

PEREZ-ARROYO, RAQUEL

Art Unit

2169

Tech Center

2100 — Computer Architecture & Software

Assignee

Capital One Services LLC

OA Round

2 (Final)

Interview Optional

— +31.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 58% grant rate with +31.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 298 resolved cases, 2023–2026

Examiner Intelligence

PEREZ-ARROYO, RAQUEL View full profile →

Grants 58% of resolved cases

Career Allowance Rate

173 granted / 298 resolved

+3.1% vs TC avg

Strong +32% interview lift

Without

With

+31.7%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

20 currently pending

Career history

327

Total Applications

across all art units

Statute-Specific Performance

§101

8.7%

-31.3% vs TC avg

§103

86.1%

+46.1% vs TC avg

§102

2.5%

-37.5% vs TC avg

§112

1.4%

-38.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 298 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
	This Office Action has been issued in response to Applicant’s Communication of amended application S/N 18/060,749 filed on November 7, 2025. Claims 1 to 18, 20, and 21 are currently pending with the application.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 3, 5, 7 to 10, 12, 14 to 17, 20, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over NATSUI (U.S. Publication No. 2023/0186092), in view of Zhang (U.S. Patent No. 11,182,691), and further in view of HAO et a. (U.S. Publication No. 2023/0141749) Hao.
	As to claim 1:
	Natsui discloses:
	A system, comprising: one or more processors; and a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to: 
receive user input comprising a number of machine learning base models to generate [Paragraph 0027 teaches receive input made by a user; Paragraph 0111 teaches receive user input of a desired number of parameters of machine learning models to generate]; 
generate a plurality of machine learning base models corresponding to the number based on the user input [Paragraph 0039 teaches generates plural untrained models; Paragraph 0111 teaches model generation unit may generate untrained models of the number of parameters for which the user input has been received]; 
determine a chunk for a machine learning base model of the plurality of machine learning base models [Paragraph 0104 teaches storing pieces of training data formed of input data and correct classification results as a training dataset, and training the untrained model using the training dataset, in other words, determining a chunk, which could comprise all the training dataset], and 
train the machine learning base model with the chunk [Paragraph 0103 teaches training the learning model; Paragraph 0104 teaches training the untrained model using the training dataset]; and 
validate the plurality of machine learning base models using the testing data [Paragraph 0105 teaches evaluating the accuracy of the model trained by the training unit; Paragraph 0139 teaches the evaluation data may be data with correct answers that have not been used for training, for example, validation data].
	Natsui does not appear to expressly disclose receive a first dataset; store a minority portion of the first dataset as testing data and a remaining portion of the first dataset as training data; separate the training data into majority cases and minority cases; iteratively for each machine learning base model of the plurality of machine learning base models until all machine learning base models of the plurality of machine learning base models are trained: determine a chunk for a machine learning base model, wherein the chunk comprises all minority cases from the training data and a subset of majority cases from the training data that is unique from other chunks associated with other machine learning base models of the plurality of machine learning base models.
	Zhang discloses:
	receive a first dataset [Column 39, lines 46 to 49 teaches a chunked data set that will be processed with filtering and splitting operations, therefore, the dataset must be received]; 
store a minority portion of the first dataset as testing data and a remaining portion of the first dataset as training data [Column 39, lines 58 to 62 teach a split operation applied to the dataset, where 70% of the chunks are placed in a training set, and 30% of the chunks are placed in a test set]; 
separate the training data into majority cases and minority cases [Column 54, lines 39 to 42 teach records of a raw training data set are classified into a majority category and two minority categories, hence, separating the training data into majority and minority cases].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Natsui, by receive a first dataset; store a minority portion of the first dataset as testing data and a remaining portion of the first dataset as training data; separate the training data into majority cases and minority cases, as taught by Zhang [Column 39, 54], because both applications are directed to generation and training of machine learning models; filtering and splitting the datasets into majority and minority cases provides improvements in prediction accuracy, training time and run-time performance (See Zhang [Col 56, line 59 - Col 57 line 5]).
Neither Natsui nor Zhang appear to expressly disclose iteratively for each machine learning base model of the plurality of machine learning base models until all machine learning base models of the plurality of machine learning base models are trained: determine a chunk for a machine learning base model, wherein the chunk comprises all minority cases from the training data and a subset of majority cases from the training data that is unique from other chunks associated with other machine learning base models of the plurality of machine learning base models.
Hao discloses:
iteratively for each machine learning base model of the plurality of machine learning base models until all machine learning base models of the plurality of machine learning base models are trained [Paragraph 0112 teaches selecting subsets of samples in the training set for each base classifier when training the initial base classification models; Paragraph 0115 teaches each base classification model uses different training data, therefore, iteratively until all base models are trained; Fig. 10, training all the models by selecting subsets of healthy data (majority cases) and all erroneous data (minority cases), until model n is trained, hence, until all models are trained]: 
determine a chunk for a machine learning base model, wherein the chunk comprises all minority cases from the training data and a subset of majority cases from the training data that is unique from other chunks associated with other machine learning base models of the plurality of machine learning base models [Paragraph 0103 teaches each base classification model is trained by using different subsets of healthy data (majority cases); Paragraph 0117 teaches constructing a training set for each LSTM model with a method of majority-class under-sampling, and performing initial training, by selecting a part of the majority-class (i.e. healthy data) samples and all the minority-class (i.e. erroneous data) samples as a training set; Paragraph 0134 teaches each base classification model is an initial base classification model that is obtained by training using all of erroneous data in the historical SMART data of the plurality of storage devices (minority class) and a first subset of healthy data in the historical SMART data (majority class), wherein the healthy data in the historical SMART data is divided into a plurality of first subsets, wherein the plurality of first subsets do not cross or overlap each other, therefore, where the chunks comprise all minority cases and a subset of majority cases that is unique from other chunks].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Natsui, by iteratively for each machine learning base model of the plurality of machine learning base models until all machine learning base models of the plurality of machine learning base models are trained: determine a chunk for a machine learning base model, wherein the chunk comprises all minority cases from the training data and a subset of majority cases from the training data that is unique from other chunks associated with other machine learning base models of the plurality of machine learning base models, as taught by Hao [Paragraphs 0103, 0112, 0115, 0117, 0134], because the applications are directed to generation and training of machine learning models; selecting a subset of healthy data (majority cases) and all the erroneous data (minority cases) in a sampling method, ensures a difference of training data while alleviating the problem of unbalanced sample proportions for majority and minority categories (See Hao Para [0117]).

As to claim 2:
	Natsui as modified by Zhang discloses:
	wherein each chunk comprises no more than 50% minority cases [Column 56, lines 26 to 33 teach an example of sample ratios includes a 33% sample of minority category 4014A, and a 50% sample of minority category 4014B, therefore, no more than 50% minority cases].
As to claim 3:
	Natsui as modified by Zhang discloses:
wherein the minority portion comprises 10 to 30% of the first dataset [Column 39, lines 58 to 62 teach a split operation applied to the dataset, where 70% of the chunks are placed in a training set, and 30% of the chunks are placed in a test set].

As to claim 5:
	Natsui as modified by Zhang discloses:
wherein each machine learning base model comprises a logistic regression model, a gradient boosted tree method model, a k-nearest neighbor model, or combinations thereof [Column 59, lines 24 to 26 teach model types include regression models, etc.].

As to claim 7:
	Natsui as modified by Zhang discloses:
wherein determining the chunk for a machine learning base model of the plurality of machine learning base models is conducted dynamically at runtime [Column 27, lines 31 to 49 teach run-time recipe manager may retrieve the executable version of R1, perform a set of run-time validations, and schedule the execution of the transformation operations of R1 at respective resource sets 1175A and 1175B, where respective outputs 1185A and 1185B may be produced by the application of the recipe R1 on input datasets 1 and 2, and where the outputs may represent data that is to be used as input for a model, in other words, the determining of the chunk is conducted dynamically at runtime].

As to claim 21:
Natsui as modified by Hao discloses:
wherein each majority case of the majority cases of training data is incorporated into at least one subset of majority cases associated with a chunk such that none of the majority cases of training data are excluded from training at least one machine learning base model of the plurality of machine learning base models [Hao - Paragraph 0112 teaches subset of the majority class samples is selected for each base classifier through an integrated strategy in order to use all sample information in the training set].
Same rationale applies to claims 8 to 10, 12, 14 to 17, and 20, since they recite similar limitations.

Claims 4, 6, 11, 16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over NATSUI (U.S. Publication No. 2023/0186092), in view of Zhang (U.S. Patent No. 11,182,691), in view of HAO et a. (U.S. Publication No. 2023/0141749) Hao, and further in view of Merrill et al. (U.S. Publication No. 2019/0043070) hereinafter Merrill.
As to claim 4:
Natsui discloses all the limitations as set forth in claim 1 above, but does not appear to expressly disclose wherein each machine learning base model comprises a gradient boosted tree method model.
Merrill discloses:
wherein each machine learning base model comprises a gradient boosted tree method model [Paragraph 0158 teaches using tree-based methods such as gradient boosted trees].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Natsui, by incorporating gradient boosted tree method model, as taught by Merrill [Paragraph 0158], because the applications are directed to generation and training of machine learning models; using a gradient boosted tree method model improves accuracy of the models.

As to claim 6:
	Natsui as modified by Merrill discloses:
the user input further comprises a selection of a logistic regression model, a gradient boosted tree method model, or a k-nearest neighbor model [Paragraph 0252 teaches receives user-selection of a model type and generates the at least one instruction that defines a model type of the protected class model based on the received user-selection; Paragraph 0298 teaches the tree model is a gradient boosted tree].  
	Same rationale applies to claims 11, 16, and 18, since they recite similar limitations.

Response to Arguments
	This is in response to arguments filed on November 7, 2025. Applicant’s arguments have been fully and respectfully considered, but are moot in view of new grounds of rejections, as necessitated by the amendments.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAQUEL PEREZ-ARROYO whose telephone number is (571)272-8969. The examiner can normally be reached Monday - Friday, 8:00am - 5:30pm, Alt Friday, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached at 571-272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RAQUEL PEREZ-ARROYO/Primary Examiner, Art Unit 2169

Read full office action

Prosecution Timeline

Dec 01, 2022

Application Filed

Aug 20, 2025

Non-Final Rejection mailed — §103

Nov 05, 2025

Examiner Interview Summary

Nov 05, 2025

Applicant Interview (Telephonic)

Nov 07, 2025

Response Filed

Feb 26, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/863,511

Patent 12632513

COMPUTER-READABLE RECORDING MEDIUM STORING LEARNING SUPPORT PROGRAM, LEARNING SUPPORT METHOD, AND LEARNING SUPPORT DEVICE

3y 10m to grant Granted May 19, 2026

19/108,804

Patent 12613912

ELECTRONIC DEVICE FOR AT LEAST ONE OF VIDEO MOMENT RETRIEVAL AND HIGHLIGHT DETECTION AND OPERATION METHOD THEREOF

1y 1m to grant Granted Apr 28, 2026

18/201,243

Patent 12608392

Unifying Runtime Catalog and Metastore for a Cloud Storage System

2y 11m to grant Granted Apr 21, 2026

17/498,124

Patent 12566786

NATURAL LANGUAGE PROCESSING WORKFLOW FOR RESPONDING TO CLIENT QUERIES

4y 4m to grant Granted Mar 03, 2026

18/478,228

Patent 12566726

ENABLING EXCLUSION OF ASSETS IN IMAGE BACKUPS

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

58%

Grant Probability

90%

With Interview (+31.7%)

3y 4m (~0m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 298 resolved cases by this examiner. Grant probability derived from career allowance rate.