Last updated: May 29, 2026
Application No. 17/526,617
CASCADING META LEARNER TO ENHANCE FUNCTIONALITIES OF MACHINE LEARNING MODELS

Non-Final OA §103
Filed
Nov 15, 2021
Examiner
HAN, BYUNGKWON
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Capital One Services LLC
OA Round
3 (Non-Final)
Interview Optional

— +0.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 0% grant rate with +0.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 2 resolved cases, 2023–2026
Examiner Intelligence

HAN, BYUNGKWON View full profile →
Grants only 0% of cases
Career Allowance Rate
0 granted / 2 resolved
-55.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
13 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
97.4%
+57.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 2 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1, 7, 11, 15, 19 – 20 were currently amended. Claims 1-20 are pending and examined herein. 
Claims 1-20 are rejected under 35 U.S.C. 103. 

Response to Amendment
The amendment filed January 26th, 2026 has been entered.  Specification has been amended. Claims 1, 7, 11, 15, 19 – 20 were currently amended. Claims 1 – 20 are pending and are examined herein. Applicant’s amendments to the specification have overcome objection previously set forth in the Final Rejection Office Action mailed October 27th, 2025.

Response to Arguments
Applicant’s arguments, see pages 11-13, filed January 26th, 2026 regarding the 35 U.S.C. 103 rejection of claims 1 – 20 have been fully considered and but they are not persuasive. With respect to Wong, applicant focuses on one example describing metadata associated with the entity and contends this cannot correspond to the amended limitation “metadata comprising column names and summary information for each column.” However, Wong’s disclosure is not limited to that example. Wong expressly generates separate feature sets form different input sources (e.g., transaction data for an observable feature vector and ancillary/historical information for a context feature vector) and then combines these vectors to provide the overall input to the trained model. Wong further teaches that the context feature generation may include computing statistical metrics from historical transaction data and including those aggregate metrics in the context feature vector. Those statistical metrics constitute “summary information” derived from the values of corresponding data fields/columns in the underlying records. Accordingly, Wong’s disclosure continues to teach using different input types and derived summaries as model inputs consistent with the amended claim language. 
With respect to Bremer, Bremer is directed to processing structured records (including tabular datasets) having attributes, and discloses that records may be provided in formats such as XML or JSON that “associate attributes and corresponding values.” Therefore, Bremer does not merely process attribute values. It teaches the record structure in which attribute identifiers (i.e., field/column identifiers) are associated with their values, and Bremer further illustrates named attributes (e.g., “first name”, “last name”, “state”), corresponding to “column names” in a tabular dataset. Bremer also states that the generated feature vectors may be provided as “metadata” of the set of records, confirming that vectors describing record content qualify as metadata rather than being excluded as mere features. Accordingly, the applied references continue to teach the amended limitations and the rejections are maintained. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1 - 4, 10 - 14, 18 - 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. (US 2023/0118240) in view of Bremer et al (US 2021/0374525), further in view of Reinders et al. (NPL: ”Neural Random Forest Imitation”).
Regarding Claim 1, Wong teaches
A computer-implemented method comprising: processing a plurality of data records using a first machine learning model; ([0044] of Wong states “The machine learning server 150 implements a machine learning system 160 for the processing of transaction data.” [0054] of Wong teaches that ML models can be generated based on a request, there can be a first request to generate a regular ML model to process transaction data)
, wherein the first machine learning model comprises a non-neural network machine learning model; ([0035] of Wong states “The term “machine learning model” is used herein to refer to at least a hardware-executed implementation of a machine learning model or function. Known models within the field of machine learning include logistic regression models, Naïve Bayes models, Random Forests, Support Vector Machines and artificial neural networks. Implementations of classifiers may be provided within one or more machine learning programming libraries including, but not limited to, scikit-learn, TensorFlow, and PyTorch.” Machine learning model Wong implements won’t be restricted to an neural network machine learning model.)
converting the first machine learning model to a neural network machine learning
model; ([0054] of Wong states “In one case, the model definition language comprises computer program code that is executable to implement one or more of training and inference of a defined machine learning model. The machine learning models may, for example, comprise, amongst others, artificial neural network architectures, ensemble models, regression models, decision trees such as random forests, graph models, and Bayesian networks. One example machine learning model based on an artificial neural network is described later with reference to FIGS. 6A to 6C.” Under BRI, this limitation can be interpreted as converting of a NN model, defined in a model definition language, into an executable NN model)
extracting, using an embedding machine learning model, metadata comprising … and summary information for each column of the plurality of data records; ([0032] of Wong states “Transaction data may comprise structured, unstructured and semi-structured data. Transaction data may also include data associated with a transaction, such as data used to process a transaction.” [0044] of Wong states “The payment processor server 140 is communicatively coupled to a first data storage device 142 storing transaction data 146 and a second data storage device 144 storing ancillary data 148. The transaction data 146 may comprise batches of transaction data relating to different transactions that are undertaken over a period of time. The ancillary data 148 may comprise data associated with the transactions, such as records storing merchant and/or end user data” [0063] of Wong teaches that the machine learning system may save data relating to the transaction along with the output of the machine learning system and later form part of future ancillary data. [0112] of Wong states “For example, FIG. 9 shows the context feature generator 920 receiving ancillary data 922 and historical transaction data 924. The uniquely identifiable entity may be a particular end user (e.g., a card holder) or merchant and the ancillary data 922 may be data retrieved from records that are associated with the uniquely identifiable entity (e.g., so-called static data that is separate from the transaction data representing past transactions). The ancillary data 922 may comprise the ancillary data 146 or 242 as described previously. The historical transaction data 924 may comprise data associated with transactions that are outside of the temporal window, e.g. data derived from transactions that are outside the aforementioned predefined time range set with reference to a timestamp of the proposed transaction or the relative time range. The context feature generator 920 may be configured to compute aggregate metrics across the historical transaction data 924 (or retrieve pre-computed aggregate metrics) and to then include the aggregate metrics in the context feature vector. Aggregate metrics may comprise simple statistical metrics or more advanced neural network extracted features.” Wong computes aggregate metrics and includes them in the context feature vector and there are examples as ‘total amount’, ‘me_amount’, ‘aggregate_amount’ which corresponds to summary information for a column as these are aggregated over values of the column. )
generating, using a metadata as input and via a embedding feature generator, a second set of feature vectors; ([0054] of Wong teaches that machine learning system can be converted from machine learning model definitions to executable machine learning models; Fig. 9 [0110] of Wong teaches that a converted executable machine learning model can contain components, such as generators; this means that the components are also converted; [0111] of Wong states “The observable feature generator 910 receives transaction data 912, 914 and uses this to generate an observable feature vector 916” and “The context feature generator 930 receives ancillary data 922, 924 and uses this to generate a context feature vector 926.”)
concatenating the first set of feature vectors with the second set of feature vectors to generate concatenated feature vectors; ([0111] of Wong states “The observable feature vector 916 and the context feature vector 926 are then combined to generate the overall feature vector 930. In one case, the observable feature vector 916 and the context feature vector 926 may be combined by concatenating the two feature vectors 916 and 926 to generate a longer vector.”)
generating, based on the converted first feature generator and the embedding feature generator, a combined machine learning model ([0054] of Wong teaches that machine learning system can be converted from machine learning model definitions to executable machine learning models. [0065] of Wong states “the machine learning systems may be implemented as a modular platform that allow for different machine learning models and configurations to be used to provide for the transaction processing.” [0110] of Wong states “The machine learning system 900 comprises an observable feature generator 910 and a context feature generator 930.”); 
and training, using the concatenated feature vectors as input, the combined machine learning model, wherein the trained combined machine learning model enhances performance of the first machine learning model ([0110] of Wong states “FIG. 9 shows an example machine learning system 900 with an adaptation to allow for training based on unlabeled data.” It teaches combining two feature vectors to generate the overall feature vector 930 and use it as input to train the binary classifier. [0105] of Wong teaches that this improved pipeline allows the training of ML system that are adapted to process transaction data.) 
generating, using multiple-dimensional input data comprising the data values, … and the summary information, and based on the trained combined machine learning model, labels corresponding to the plurality of data records; ( [0112] of Wong states “For example, FIG. 9 shows the context feature generator 920 receiving ancillary data 922 and historical transaction data 924. The uniquely identifiable entity may be a particular end user (e.g., a card holder) or merchant and the ancillary data 922 may be data retrieved from records that are associated with the uniquely identifiable entity (e.g., so-called static data that is separate from the transaction data representing past transactions). The ancillary data 922 may comprise the ancillary data 146 or 242 as described previously. The historical transaction data 924 may comprise data associated with transactions that are outside of the temporal window, e.g. data derived from transactions that are outside the aforementioned predefined time range set with reference to a timestamp of the proposed transaction or the relative time range. The context feature generator 920 may be configured to compute aggregate metrics across the historical transaction data 924 (or retrieve pre-computed aggregate metrics) and to then include the aggregate metrics in the context feature vector. Aggregate metrics may comprise simple statistical metrics or more advanced neural network extracted features.” As Fig.12 shows method of training a machine learning system to detect anomalies within transaction data, it continues to use the system described in Fig. 9 which contains the combined machine learning model. [0122] of Wong states ”The method 1200 may be used to implement the pipeline 1000 shown in FIG. 10. At block 1202, the method 1200 comprises obtaining a training set of data samples. The data samples may comprise data samples such as 1012 in FIG. 10. Each data sample is derived, at least in part, from transaction data and is associated with one of a set of uniquely identifiable entities. For example, a data sample may have, or be retrieved based upon, one or more unique identifiers relating to a user or a merchant. In this method, at least a portion of the training set is unlabelled.“ [0123] of Wong states “At block 1204, the method 1200 comprises assigning a label indicating an absence of an anomaly to unlabelled data samples in the training set.“ [0126] of Wong states ”At block 1210, the method 1200 comprises assigning a label indicating a presence of an anomaly to the synthetic data samples” [0038] of Wong states “As discussed above, the term “tensor” is used, as per machine learning libraries, to refer to an array that may have multiple dimensions, e.g. a tensor may comprise a vector, a matrix or a higher dimensionality data structure. In preferred example, described tensors may comprise vectors with a predefined number of elements.” Wong teaches multi-dimensional model inputs (tensor/vector) that include raw record values such as “amount” and derived summary information such as “aggregate_amount” or “total amount”. )
permitting, based on the labels and in real time, one or more transactions related to the plurality of data records to proceed. (Adding onto the system described from above limitation [0122-127], invention comprise the approving or declining of the transaction. [0044] and [0104] also mentions but [0139] seems more direct after the explanation of the supervised ML system from above limitation. [0139] In certain cases, block 1316 may comprise approving or declining the transaction based on the output of the supervised machine learning system. This may comprise generating control data to control whether at least one transaction within the transaction data is accepted or denied based on the value output by the supervised machine learning system. For example, in a simple case, a threshold may be applied to the output of the supervised machine learning system and values higher than the threshold (representing an “anomaly”) may be declined while those below the threshold may be approved (representing “normal” actions), which a suitable decision being made for values equal to the threshold. In certain cases, such as those illustrated in FIGS. 1A to 1C and FIGS. 5A to 5B, the transaction data may be received from a point-of-sale device in relation to a transaction to be approved. In these cases, block 1316 may comprise approving the transaction response to the value output by the supervised machine learning system being below a predefined threshold.)
Wong does not explicitly teach that 
metadata comprising column names…, input data comprising the column names
generating, using data values in each column of the plurality of data records as input and via a converted first feature generator, a first set of feature vectors; 
the converted first feature generator and the embedding feature generator can be machine learning models.
, wherein an autoencoder generates text embeddings corresponding to the structural data of the plurality of data records;
,wherein the neural network machine learning model mimics output from the first machine learning model; 
However, Bremer teaches that 
metadata comprising column names…,  input data comprising the column names ([0027] of Bremer states “A dataset may comprise the set of records being processed by the present subject matter. For example, the dataset may be provided in the form of a collection of related records contained in a file (e.g., the dataset may be a file containing records of all students in class). The dataset may, for example, be a table of a database or a file of a Hadoop file system, etc. In another example, the dataset may comprise one or more documents such as a HyperText Markup Language (HTML) page or other document types.” [0060] of Bremer states “A dataset of records 107 stored in the central repository 103 may have values of a set of attributes a1 . . . aN (N≥1) such as a first name attribute. Although the present example is described in terms of few attributes, more or fewer attributes may be used. The dataset 107 being used in accordance with the present subject matter may comprise at least part of the records of the central repository 103. [0061] of Bremer states “For example, the received records from the client systems 105 may have a structure different from the structure of the stored records of the central repository 103. For example, a client system 105 may be configured to provide records in XML format, JSON format or other formats that enable to associate attributes and corresponding attribute values.” [0064] of Bremer states “Each of the attribute level trained data representation learning models 121.1-121.N may be configured to receive a value of a respective attribute a1 . . . aN and to generate a corresponding individual feature vector. The individual feature vectors may be combined to obtain a single feature vector that represents a record. The combination may, for example, be performed using N trained weights α1 . . . αN (not shown in FIG. 1) associated with the set of N attributes a1 . . . aN.” [0076] of Bremer states “In a second storage example, the set of feature vectors may be stored in association with respective set of records. This may particularly be advantageous as the set of feature vectors may not require a large amount of storage resources. This may enable to provide the set of features vectors as metadata of the set of records.” Bremer’s records are structured as a set of named attributes and may be presented in Json/XML as attribute identifiers + values, which reads on “column names” of metadata. Handling attributes require identifying columns in dataset comprised of table. )
generating, using data values in each column of the plurality of data records as input and via a converted first feature generator, a first set of feature vectors; ([0041] of Bremer states “According to one embodiment, the trained data representation learning model is configured to output a feature vector of the set of feature vectors by generating for each attribute of the set of attributes an individual feature vector and combining the individual features vectors to obtain said feature vector. By processing the attributes at an individual level, this embodiment may enable to access features of the records in more details and may thus provide a reliable representation of the records.” [0052] of Bremer states “According to one embodiment, the trained data representation learning model comprises one trained neural network per attribute of the set of attributes, wherein the output of each feature vector of the set of feature vectors comprises: inputting the value of each attribute of the set of attributes into the associated trained neural network, receiving, in response to the inputting, an individual feature vector from each of the trained neural networks, and combining the individual feature vectors to obtain said feature vector.” Bremer teaches inputting values of a record’s attributes into attribute level models that could be neural network to generate individual feature vectors.)
the converted first feature generator and the embedding feature generator can be machine learning models (Fig. 6B, [0056] of Bremer states “A second subset of attribute level data representation learning models may be provided for the second subset of attributes, wherein each attribute level data representation learning model of the second subset is configured to generate a feature vector for a respective attribute of the second subset of attributes. A data representation learning model may be created such that it comprises the first trained data representation learning model and the second subset of attribute level data representation learning models. The created data representation learning model may be trained to generate the trained data representation learning model.” [0064] of Bremer states “In one example, the trained data representation learning model 120 may comprise multiple attribute level trained data representation learning models 121.1-121.N. Each of the attribute level trained data representation learning models 121.1-121.N may be associated with a respective attribute of the set of attributes a1 . . . aN. Each of the attribute level trained data representation learning models 121.1-121.N may be configured to receive a value of a respective attribute a1 . . . aN and to generate a corresponding individual feature vector.”, [0072] of Bremer states “The trained data representation learning model 120 may be configured to generate an individual feature vector for each received value. The individual feature vectors may be combined by the trained data representation learning model 120 in order to generate a feature vector that represents the record R1. This example may particularly be advantageous in case the set of attributes are of the same type. That is, a single trained data representation learning model (e.g., a single neural network) may validly generate feature vectors for different attributes of the same type.”).
, wherein an autoencoder generates text embeddings corresponding to the structural data of the plurality of data records; ([0030] of Bremer states “In one example, the trained data representation learning model may be one similarity encoder that is configured to receive the set of one or more attributes of a record and to generate a feature vector that represents the record. The feature vector may, for example, be obtained by processing collectively the values of the set of attributes, or may be obtained as a combination of individual feature vectors associated with the set of attributes, wherein the similarity encoder is configured to consecutively receive the values of the set of attributes and to generate associated individual feature vectors.” [0051] of Bremer states “In another example, the trained data representation learning model comprises an autoencoder.”)
Reinders teaches that 
,wherein the neural network machine learning model mimics output from the first machine learning model; (Pg. 2 I. Introduction section of Reinders states “In this work, we present a transformation of random forests into neural networks which creates a very efficient neural network. We introduce a method for generating data from a random forest which creates any amount of input data and corresponding labels. With this data, a neural network is trained that learns to imitate the random forest.”)
It would have been obvious to one with ordinary skill in the art before the effective filling date of the invention to combine the teachings of Wong, Reinders, and Bremer because both are directed towards using ML to process large amount of data. Reinders further teaches that neural network imitating random forest “creates very efficient neural networks that learn the decision boundaries of a random forest without any additional training data”(Reinders, Pg.1 Abstract). One with ordinary skill in the art would be motivated to incorporate the teachings of Bremer into that of Wong because Bremer further improves efficiency of storage of large amount of data for ML processing (Bremer, [0007], [0036]) and Reinders improves neural network efficiency. 

Regarding Claim 2, the rejection of Claim 1 is incorporated herein. Furthermore, Wong teaches 
the first machine learning model comprises a decision tree model, a standard normal variate (SNV) model, a support vector machine (SVM) model or a random forest model ([0035] states “Known models within the field of machine learning include logistic regression models, Naïve Bayes models, Random Forests, Support Vector Machines and artificial neural networks”.)

Regarding Claim 3, the rejection of Claim 1 is incorporated herein. Furthermore, Wong teaches 
the embedding machine learning model comprises the autoencoder, a variational autoencoder (VAE), a Bert Model or a transformer model ([0144] states “Approaches for performing unsupervised outlier detection include using tree-based isolation forests, generative adversarial networks or variational autoencoders.”)

Regarding Claim 4, the rejection of Claim 1 is incorporated herein. Furthermore, Wong teaches 
the neural network machine learning model comprises a fully connected neural network (FCNN), a convolutional neural network (CNN), a recurrent neural network, or a feed forward neural network. ([0037] states “Neural network types include convolutional neural networks, recurrent neural networks, and feed-forward neural networks”.)

	Regarding Claim 10, the rejection of Claim 1 is incorporated herein. Furthermore, Wong teaches 
receiving, as output from the first machine learning model and based on the plurality of data records, a plurality of prediction labels associated with the plurality of data records; ([0138] states “At block 1316, the received transaction data is selectively labelled based on the value output by the supervised machine learning system. This may be performed by the supervised machine learning system (e.g., as described with reference to FIG. 4).”)
providing, as input to a classification layer, the concatenated feature vectors; (FIG 9. 940 binary classifier is trained with 930 input vectors. [0111] states “In one case, the observable feature vector 916 and the context feature vector 926 may be combined by concatenating the two feature vectors 916 and 926” and these concatenated feature vectors are inputted as input vector 930 stated in [0111] “combinatory logic and/or one or more neural network layers may be used that receive the observable feature vector 916 and the context feature vector 926 as input and map this input to the feature vector 930.”)
receiving, as output from the classification layer and based on the concatenated feature vectors, a plurality of new prediction labels associated with the plurality of data records; (FIG 9. 950 the scalar output, stated in [0110] is indicative of a presence of an anomaly. If the binary classifier 940 is trained on data that has two assignable labels, resulting output will be in binary label.)
comparing the plurality of prediction labels with the plurality of new prediction labels; ([0117] states “In the case that the binary classifier 940 comprises a neural network architecture, during training, a prediction from the binary classifier 940 in the form of scalar output 950 may be compared within the assigned label 1076, e.g. in a loss function, and an error based on the difference between the scalar output 950 and one of the numeric values 0 or 1 in the label may be propagated back through the neural network architecture. In this case, a differential of the loss function with respect to the weights of the neural network architecture may be determined and used to update those weights”)
And training the combined machine learning model based on the comparison. ([0136] states “Training may comprise applying an available model fitting function in a machine learning computer program code library. In cases where the supervised machine learning system comprises a neural network architecture, training may comprise applying backpropagation with gradient descent, using a loss function based on a difference between a prediction output by the supervised machine learning system and the assigned labels. Training may be performed at a configuration stage prior to application of the method 1300.”)

Claims 11, 19 recite substantially similar subject matter as claim 1 respectively, and are rejected with the same rationale, mutatis mutandis.
Claims 12 - 14 recite substantially similar subject matter as claims 2 - 4 respectively, and are rejected with the same rationale, mutatis mutandis.
Claim 18 recite substantially similar subject matter as claim 10, and is rejected with the same rationale, mutatis mutandis.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. (US 2023/0118240 A1) in view of Bremer et al (U.S. Pub. 2021/0374525), Reinders et al. (NPL: ”Neural Random Forest Imitation”), further in view of Tristan et al (US 2019/0095805 A1).

	Regarding Claim 5, the rejection of Claim 1 is incorporated herein. Furthermore, Tristan teaches
receiving the plurality of data records in a first data format; converting the plurality of data records from the first data format to a second data format; and generating, using the first machine learning model based on the plurality of data records in the second data format, a plurality of prediction labels ([0086] states “In some embodiments, I/O interface 830 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 820) into a format suitable for use by another component (e.g., processor 810)” and [0045] states “In some embodiments, the models may be executed on different cores of a multicore processor, or different processes or threads in an execution environment that supports multiple processes or threads”. Therefore, I/O interface converts the data in system memory as the first format into format more suitable by processor as the second format. Processor is used for execution of machine learning model in the disclosure and used to create decision, where [0018] states “the decision may indicate a prediction of future conditions based on presently known data.”)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify combination of Wong, Reinders, and Bremer in view of Tristan because both are directed toward processing large data with ML. Tristan operates ML on large universe of features containing various form of data. Said person would incorporate teaching from Tristan to process different format to learn from empirical data and generalize to solve problems in many different domains. 

Claims 9, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. (US 2023/0118240 A1) in view of Bremer et al (U.S. Pub. 2021/0374525), Reinders et al. (NPL: ”Neural Random Forest Imitation”), further in view of Wang et al. (Wang: Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network).

	Regarding Claim 9, the rejection of Claim 1 is incorporated herein. Furthermore, Wang teaches 
training, based on the plurality of data records and prediction labels generated by the first machine learning model, the neural network machine learning model to be a proxy to the first machine learning model. (Pg. 3038, 3.3 Architecture of proposed DAE-CNN chapter, describes using deep autoencoder (DAE) to pre-train and extract output of hidden layer and make it available to use for convolutional neural network (CNN). Fig 4. DAE-CNN-S model shows the behavior of CNN proxy to the DAE using output generated by DAE.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify combination of Wong, Reinders, and Bremer in view of Wang because they are both incorporating to effectively create hybrid model for better performance on outputting result from big data. Said person would incorporate teaching from Wang and use hybrid machine learning model to reduce cost of training and time consumption to take advantage of the power of neural network. 

Claim 17 recite substantially similar subject matter as claim 9 respectively, and is rejected with the same rationale, mutatis mutandis.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. (US 2023/0118240 A1) in view of Bremer et al (US 2021/0374525), Tristan et al (US 2019/0095805 A1), Reinders et al. (NPL: ”Neural Random Forest Imitation”), further in view of Goodsitt et al. (US 2020/0111019 A1).
Regarding Claim 6, the rejection of Claim 5 is incorporated herein. Furthermore, Wong teaches 
the plurality of data records comprise transaction records, ([0002] states “The present invention relates to systems and methods for applying machine learning systems to transaction data”.)
The combination of Wong and Bremer do not appear to explicitly teach
and wherein the predicted labels comprise an indication whether the plurality of data records contain sensitive data. 
However, Goodsitt—directed to analogous art—teaches 
and wherein the predicted labels comprise an indication whether the plurality of data records contain sensitive data. ([0057] states “The recurrent neural network can be configured to predict whether a character of a training sequence is part of a sensitive data portion”.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of this application to combine the methods of Wong, Reinders, and Bremer with indication of sensitive data from Goodsitt because, as Goodsitt states in [0004], “Even though the data in the original dataset can be anonymized, the use of original datasets has significant privacy implicants.” Due to use of transaction data including SSN as stated in the original disclosure, it is obvious to a person having ordinary skill in the art to consider indicating whether the data contains sensitive data. 

Claims 7 - 8, 15 - 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wong et al. (US 2023/0118240 A1) in view of Bremer et al (US 2021/0374525), Reinders et al. (NPL: ”Neural Random Forest Imitation”), further in view of Carrasco (US 2021/0012236 A1).

Regarding Claim 7, the rejection of Claim 1 is incorporated herein. Furthermore, the combination of Wong and Bremer teach 
the metadata further comprises data sources, table names. ([0053] teaches that ancillary data comprise secondary data linked to one or more entities identified in the primary data associated with the transaction. It could be representation of data retrieved from a database or other data storage. Thus, it can pertain any data or table related information of the data.) 
The combination of Wong and Bremer do not appear to explicitly teach
and correlation between features that are associated with the plurality of data records.
However, Carrasco—directed to analogous art—teaches 
correlation between features that are associated with the plurality of data records.  ([0064] states “On the other hand, feature meta-data can include… the features' relationships with other features and models.”)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of this application to combine the methods of Wong, Reinders, and Bremer with Carrasco because metadata are both used to represent information of the features. Said person would be motivated to use features’ relationships with other features and models as it could help to determine which data are relevant and which are not.  

Regarding Claim 8, the rejection of Claim 1 is incorporated herein. Furthermore, the combination of Wong and Bremer do not appear to explicitly teach
the metadata comprises a mean, a variance, a range or a length associated with a column in the plurality of data records.
However, Carrasco—directed to analogous art—teaches 
the metadata comprises a mean, a variance, a range or a length associated with a column in the plurality of data records. ([0064] states “On the other hand, feature meta-data can include standard statistical metrics (mean, average, maximum, minimum, and standard deviation) and the features' relationships with other features and models.”)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of this application to combine the methods of Wong and Bremer with standard statistical metrics of metadata by Carrasco because, the listed limitations by original disclosure are commonly used statistical metrics. Said person could utilize statistical metrics from metadata when used as input for model training to use as inference for standard algorithm metrics, such as regression and classification.

Claims 15 - 16 recite substantially similar subject matter as claim 7 - 8 respectively, and are rejected with the same rationale, mutatis mutandis.
Claim 20 recite substantially similar subject matter as claim 7 and 8 combined respectively, and is rejected with the same rationale, mutatis mutandis.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BYUNGKWON HAN whose telephone number is (571)272-5294. The examiner can normally be reached M-F: 9:00AM-6PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BYUNGKWON HAN/               Examiner, Art Unit 2121                                                                                                                                                                                         
/Li B. Zhen/               Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Show 3 earlier events
Jun 09, 2025
Examiner Interview (Telephonic)
Jul 01, 2025
Response Filed
Oct 27, 2025
Final Rejection mailed — §103
Jan 26, 2026
Request for Continued Examination
Feb 01, 2026
Response after Non-Final Action
Mar 04, 2026
Non-Final Rejection mailed — §103
May 06, 2026
Examiner Interview Summary
May 06, 2026
Applicant Interview (Telephonic)
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
Grant Probability
With Interview (+0.0%)
4y 2m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 2 resolved cases by this examiner. Grant probability derived from career allowance rate.