Office Action Analysis: 18314880 — ENHANCING NEAREST NEIGHBOR ALGORITHM USING A SET OF PARALLEL MODELS

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 15, 2023 was filed. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claims 1, 8, 15, and their respective dependent claims are objected to because of the following informalities: 
Change “the first number” to “the first number of models”
Change “the second number” to “the second number of training subsets”
Change “the third number” to “the third number of data subsets”
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.

Claim 1
Step 1: The claim recites a method; therefore, it is directed to the statutory category of
processes.
Step2A Prong 1: The claim recites, inter alia:
 [G]enerating… a training dataset by preprocessing a first set of data into the training dataset comprising a plurality of training data subsets: This limitation is seen as a mental process because it involves organizing and preparing data into subsets, which can be performed in the human mind.
…partitioning the training dataset into a plurality of training subsets, wherein the plurality of training subsets comprises a second number of training subsets, wherein the second number is the same as the first number: This is seen as a mental process because it describes dividing data into groups based on numerical determination that can be done in the human mind.
wherein a first training data subset of the plurality of training subsets corresponds to a first feature space of a plurality of feature spaces determined… and a second training data subset of the plurality of training data subsets corresponds to a second feature space of the plurality of feature spaces:  This is seen as a mental process because it involves associating sets of data with conceptual feature spaces.
and wherein the first feature space is compatible with the second feature space such that features from the first feature space are combinable with features from the second feature space: This is seen as a mental process because it defines a conceptual relationship of compatibility and combinability between abstract feature spaces, which can be performed by a human’s judgement and reasoning.
[D]etermining, for each training subset of the plurality of training subsets, a plurality of projections of data points into a corresponding feature space, the data points included in the training subset: This limitation recites a mental process as it involves analytically mapping data points into feature spaces based on a determination.
[E]xtracting, from the plurality of training subsets, a plurality of training features comprising one or more training feature subsets, the one or more training feature subsets each comprising one or more indications of similarity between the projections of data points included in a common training subset of the plurality of training subsets: This limitation is seen as a mental process because it involves extracting features from training subset data and grouping similarities among the representations of data.
[P]artitioning… a second set of data into a plurality of interaction data subsets comprising a third number of data subsets, wherein the third number is the same as the first number: This limitation is seen as a mental process because it involves dividing data into groups based on a numerical criterion. 
[A]llocating… each interaction data subset of the plurality of interaction data subsets to a different computer model of the set of computer models: This is seen as a mental process because it involves assigning groups of data to different models.
[G]enerating… a plurality of projections of data points… using a corresponding interaction data subset of the plurality of interaction data subsets, wherein the plurality of projections of data points comprises a plurality of projection subsets: This is seen as a mental process because it describes grouping data into projected representations.
each projection subset of the plurality of projection subsets corresponding to a different computer model of the set of computer models: This limitation is seen as a mental process because it involves the determination that each subset of projection data points corresponds to a different model.
each projection subset having a corresponding feature space of the plurality of feature spaces: This is a mental process because it defines mapping between data groups and feature spaces.
[D]etermining…for each plurality of projection subsets, a plurality of relative differences between each projection of the plurality of projection subsets: This is a mental process because it involves comparing projected data representations.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
by a computing device… training, by the computing device and using the training dataset, a set of computer models comprising a first number of models by: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
and training each computer model of the set of computer models using a different training feature subset of the one or more training feature subsets…by executing each computer model of the set of computer models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
and providing…an output of the set of computer models by aggregating the plurality of projections of data points: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception because the additional elements are as follows:
by a computing device… training, by the computing device and using the training dataset, a set of computer models comprising a first number of models by: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
and training each computer model of the set of computer models using a different training feature subset of the one or more training feature subsets… by executing each computer model of the set of computer models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
and providing…an output of the set of computer models by aggregating the plurality of projections of data points: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts). The claim merely describes applying known data organization and mathematical operations to machine learning training data, including preprocessing and partitioning datasets, associating data with conceptual feature spaces, projecting data points, extracting similarities, and determining relative differences between projections. The recitation of training a set of computer models, executing the models using different training feature subsets, and providing an aggregated output merely describes generic computer implementation and routine data gathering and output, without improving the functioning of a computer or machine learning models themselves.

Claim 2
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
determining… a number of computer models to include in the set of computer models based on a number of available computational resources: This seen as a mental process because it involves the determination of how many models to include and evaluating the available computational resources.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
by the computing device: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
by the computing device: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 3
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
[D]etermining… a number of computer models to include in the set of computer models based on a size of the training dataset: This is a mental process since it deals with the determination of how many models to train based on the size of the dataset.
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
by the computing device: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
by the computing device: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 4
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
[D]etermining the number of computer models to include in the set of computer models includes: determining an optimal size for each training data subset of the plurality of training data subsets: This is seen as a mental process because it involves evaluating data and selecting an optimal size.
and determining the number of computer models to include in the set of computer models to correspond to the optimal size for each training data subset of the plurality of training data subsets: This is seen as a mental process because it involves correlating data subset sizes with the number of models, which is a conceptual calculation.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 5
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
[D]etermining a set of weights, each weight of the set of weights corresponding to a different projection of the plurality of projections of data points: This limitation is a mental process because it involves the determination of each weight corresponding to a projection.
and applying the set of weights to the plurality of projections of data points to resolve relative differences included in the plurality of projections: This is seen as a mathematical concept because it deals with using a math equation to calculate the differences between the projections. See Paragraph 91 that states “…determining the set of relative differences between each projection of the plurality of projection subsets can include determining a cosine similarity, a Minkowski distance, or a Jaccard similarity between each projection…”
Step 2A Prong Two: This judicial exception is not integrated into a practical application because the additional elements are as follows:
the set of computer models is a set of nearest neighbor models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
and wherein providing the output of the set of nearest neighbor models includes: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
the set of computer models is a set of nearest neighbor models: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
and wherein providing the output of the set of nearest neighbor models includes: Insignificant extra-solution as the limitation amounts to necessary data outputting (MPEP 2106.05(g)(3)). This falls under Well-Understood, Routine, Conventional activity -see MPEP 2106.05(d)(II)(vi).
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 6
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
the one or more indications of similarity between the projections of data points included in a common training subset of the plurality of training subsets includes one or more cosine similarities, one or more Minkowski distances, or one or more Jaccard similarities between the projections of data points included in a common training subset of the plurality of training subsets: This limitation is a mathematical concept because it deals with known mathematical functions for calculating similarities.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.

Claim 7
Step 1: A process, as above.
Step2A Prong 1: The claim recites, inter alia:
[D]etermining the plurality of relative differences between each projection of the plurality of projection subsets includes determining a cosine similarity, a Minkowski distance, or a Jaccard similarity between each projection included in a respective projection subset of the plurality of projection subsets: This limitation is a mathematical concept because it deals with known mathematical functions for calculating similarities.
Step 2A Prong Two and Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A prong 2. The claim is ineligible.
Even when considered in combination, these additional elements represent mere instructions to apply an exception and therefore do not provide an inventive concept. The claim is ineligible.
Claim 8
Step 1: The claim recites a non-transitory machine-readable medium; therefore, it is directed to the statutory category of manufacture.
Step2A Prong 1: The claim recites, inter alia:
generating a training dataset by preprocessing a first set of data into the training dataset comprising a plurality of training data subsets: This limitation is seen as a mental process because it involves organizing and preparing data into subsets, which can be performed in the human mind.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the
additional elements are as follows:
A non-transitory machine-readable storage medium comprising a computer- program product that includes instructions configured to cause a data processing apparatus to perform operations: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
A non-transitory machine-readable storage medium comprising a computer- program product that includes instructions configured to cause a data processing apparatus to perform operations: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The remainder of the claim recites identical limitations to claim 1. Therefore, claim 8 is rejected using the same rationale as claim 1.
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts). The claim merely describes applying known data organization and mathematical operations to machine learning training data, including preprocessing and partitioning datasets, associating data with conceptual feature spaces, projecting data points, extracting similarities, and determining relative differences between projections. The recitation of training a set of computer models, executing the models using different training feature subsets, and providing an aggregated output merely describes generic computer implementation and routine data gathering and output, without improving the functioning of a computer or machine learning models themselves.

Claims 9-14 are manufacture claims that recite identical limitations to claims 2-7. Therefore, claims 9-14 are rejected using the same rationale as claims 2-7.

Claim 15
Step 1: The claim recites a system; therefore, it is directed to the statutory category of machine.
Step2A Prong 1: The claim recites, inter alia:
generating a training dataset by preprocessing a first set of data into the training dataset comprising a plurality of training data subsets: This limitation is seen as a mental process because it involves organizing and preparing data into subsets, which can be performed in the human mind.
Step2A Prong 2: This judicial exception is not integrated into a practical application because the additional elements are as follows:
A system, comprising: one or more data processors; and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are as follows:
A system, comprising: one or more data processors; and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations: Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and cannot provide inventive concept (MPEP 2106.05(f)).
The remainder of the claim recites identical limitations to claim 1. Therefore, claim 15 is rejected using the same rationale as claim 1.
The elements in combination as an ordered whole still do not amount to significantly more than the judicial exception (i.e., the abstract ideas of mental processes and mathematical concepts). The claim merely describes applying known data organization and mathematical operations to machine learning training data, including preprocessing and partitioning datasets, associating data with conceptual feature spaces, projecting data points, extracting similarities, and determining relative differences between projections. The recitation of training a set of computer models, executing the models using different training feature subsets, and providing an aggregated output merely describes generic computer implementation and routine data gathering and output, without improving the functioning of a computer or machine learning models themselves.

Claims 16-20 are machine claims that recite identical limitations to claims 2-7. Therefore, claims 16-20 are rejected using the same rationale as claims 2-7.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Deegalla (“Random subspace and random projection nearest neighbor ensembles for high dimensional data”, 2022) in view of García-Pedrajas (“Boosting k-nearest neighbor classifier by means of input space projection”, 2009).

Regarding claim 1,
Deegalla teaches [a] computer-implemented method comprising: generating, by a computing device, a training dataset by preprocessing a first set of data into the training dataset comprising a plurality of training data subsets (Page 1 Introduction, “The random subspace and random projection methods have been used to create ensembles; in the first case by choosing a random subset of features from the original feature set and in the second case by projecting the original features into lower dimensions using a random projection matrix.”, Page 2 Under Section 2.3, “For each base classifier from 1 to 𝑠 ∗ Select 𝑓 features randomly from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training”, Page 3 Section 3.2.1, “To reduce the computational burden, we have considered ten linearly spaced features from the original dimension.”
Deegalla teaches generating a training dataset and creating multiple subsets by either randomly selecting features (random subspace) or projecting features into lower dimensions (random projection) for each base classifier in the ensemble.);
training… and using the training dataset, a set of computer models comprising a first number of models by: partitioning the training dataset into a plurality of training subsets, wherein the plurality of training subsets comprises a second number of training subsets, wherein the second number is the same as the first number (Page 2 Section 2.3, “Nearest neighbor algorithms are known to be stable for the variation of the dataset… The steps of forming nearest neighbor ensembles are as follows: Input:– the dataset– the ensemble size 𝑠– the number of features in each subset 𝑓– number of nearest neighbors Procedure:– For each base classifier from 1 to 𝑠 ∗ Select 𝑓 features randomly from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training…”
Deegalla teaches training an ensemble of s models where each model consists of partitioning the dataset by selecting f random features or projections. Each model has their own subset of features/ projections making it a one-to-one correspondence between the models and the subsets.),
 wherein a first training data subset of the plurality of training subsets corresponds to a first feature space of a plurality of feature spaces determined… and a second training data subset of the plurality of training data subsets corresponds to a second feature space of the plurality of feature spaces (Page 2 Section 2.1, “The random subspace method… selecting a subset of features 𝑑 from the original feature set 𝑜,”, Page 2 Section 2.2, “In random projection, the original dataset is transformed into a lower dimensional subspace by using a random matrix… For each base classifier from 1 to 𝑠 ∗ Select 𝑓 features randomly from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training”
Deegalla teaches that each classifier uses a unique subset which is either a random selection of original features or a random linear projection which defines the feature space for that classifier’s training data.), 
and wherein the first feature space is compatible with the second feature space such that features from the first feature space are combinable with features from the second feature space (Page 2 Section 2.2, “The transformation can be given by: 𝑍 =𝑋𝑅 (1) where 𝑋𝑛×𝑜 is the original data matrix, 𝑅𝑜×𝑑 is the random matrix and 𝑍 is the transformed data matrix… Select the same features (or transform the original dimension using the same random matrix) in the testing… Using class labels among 𝑠 base classifiers, find the final class of the ensemble using majority voting”, Page 2 Section 2.3 Procedure, “For each base classifier from 1 to s… According to the Euclidean distance, find the class labels of 𝑘 closest instances and predict the final class using majority voting– Using class labels among 𝑠 base classifiers, find the final class of the ensemble using majority voting”
Deegalla teaches that the feature spaces are compatible because every classifier in the ensemble is designed to perform the same operation which is using Euclidean distance to produce a class label. These labels are then directly combined via majority voting, making the outputs (the feature spaces that generated them) combinable through the aggregation process.); 
determining, for each training subset of the plurality of training subsets, a plurality of projections of data points into a corresponding feature space, the data points included in the training subset (“The transformation can be given by: 𝑍 =𝑋𝑅 (1) where 𝑋𝑛×𝑜 is the original data matrix, 𝑅𝑜×𝑑 is the random matrix and 𝑍 is the transformed data matrix… manipulating the different feature space was considered to form nearest ensembles… The steps of forming nearest neighbor ensembles are as follows… For each base classifier from 1 to 𝑠 ∗ Select 𝑓 features randomly from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training…”
Deegalla teaches projecting the original data matrix R(Z = XR), which is the act of determining projections of data points into a new, lower-dimensional feature for each subset.);
extracting, from the plurality of training subsets, a plurality of training features comprising one or more training feature subsets, the one or more training feature subsets each comprising one or more indications of similarity between the projections of data points included in a common training subset of the plurality of training subsets (Page 2 Section 2.3, “For each base classifier from 1 to s… According to the Euclidean distance, find the class labels of 𝑘 closest instances and predict the final class using majority voting”
Deegalla teaches extracting an “indication of similarity” by calculating the Euclidean distance between projected data points within a classifier’s training subset. The set of these distances for the k nearest data points constitutes the “training feature subset” used by that classifier to make its prediction.);
 and training each computer model of the set of computer models using a different training feature subset of the one or more training feature subsets (Page 2 Section 2.3, “For each base classifier from 1 to 𝑠 ∗ Select 𝑓 features randomly…”, Page 1 Introduction, “The random subspace and random projection methods from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training”
Deegalla teaches training each base classifier in the ensemble on a different feature subset, created by random subspace selection or random projection, as part of the procedure.); 
Deegalla does not teach partitioning, by the computing device, a second set of data into a plurality of interaction data subsets comprising a third number of data subsets, wherein the third number is the same as the first number; allocating, by the computing device, each interaction data subset of the plurality of interaction data subsets to a different computer model of the set of computer models; generating, by the computing device, a plurality of projections of data points by executing each computer model of the set of computer models using a corresponding interaction data subset of the plurality of interaction data subsets, wherein the plurality of projections of data points comprises a plurality of projection subsets, each projection subset of the plurality of projection subsets corresponding to a different computer model of the set of computer models, each projection subset having a corresponding feature space of the plurality of feature spaces; determining, by the computing device and for each plurality of projection subsets, a plurality of relative differences between each projection of the plurality of projection subsets; and providing, by the computing device, an output of the set of computer models by aggregating the plurality of projections of data points.
García-Pedrajas, in the same field of endeavor, teaches partitioning… a second set of data into a plurality of interaction data subsets comprising a third number of data subsets, wherein the third number is the same as the first number (Page 2 of Introduction, “Both methods are based on changing the view that each classifier of the ensemble has of the instances, modifying the input space… (i) selecting a subset of the inputs; and (ii) transforming the inputs by means of a non-linear projection.”, Page 3 Section 3, “We must modify how the k-NN classifier at step t views the training data, to bias that view for focusing on missclassified instances. Thus, we present here two methods that act on the columns of the training data, by means of feature selection and feature trans formation…. The first of the two methods is based on searching for subspaces of the original input space where missclassified instances are more likely to be correctly classified.”, Page 3 and 4 Section 3.1.1, “Thus, our approach is based on using different subspaces for each k-NN classifier, but these subspaces are not randomly chosen. Instead, we select the subspace that minimizes the weighted error for each boosting step… The basic idea underlying the method is that for each boosting iteration, as the distribution of the instances is different, a different subsets of inputs should be relevant…”, Page 4 Algorithm 2,

    PNG
    media_image1.png
    282
    606
    media_image1.png
    Greyscale

García-Pedrajas teaches partitioning a dataset into classifier-specific interaction subsets. In Algorithm 2, for each classifier F_t in the ensemble, the method obtains subspace R^D_t and then applies it to create F_t = L(projection of S into R^D_t). This projected dataset is a unique interaction data subset allocated to classifier F_t. The process iterates for all classifiers T, thus partitioning the second set of data (S) into a plurality of subsets, where the number of subsets (third number) is exactly equal to the number of classifiers (first number). Partitioning data into multiple subspaces or projections constitutes dividing the dataset into multiple data subsets where each model represents a different interaction of the original data.); 
allocating… each interaction data subset of the plurality of interaction data subsets to a different computer model of the set of computer models (Page 1 Introduction, “An ensemble of classifiers consists of a combination of different classifiers…”, Page 2 Introduction, “Thus, the basic idea of the two methods is modifying the view of the data each classifier sees in a way that improves the weighted accuracy over the instances.”, Page 3 Section 3.1, “…the new classifier added to the ensemble is trained using the obtained subspace and a new boosting step is performed.”
Each partitioned subspace or projection of the data is allocated to a different k-NN classifier so that each classifier model trains on their own allocated subspace data.); 
generating… a plurality of projections of data points by executing each computer model of the set of computer models using a corresponding interaction data subset of the plurality of interaction data subsets, wherein the plurality of projections of data points comprises a plurality of projection subsets, each projection subset of the plurality of projection subsets corresponding to a different computer model of the set of computer models, each projection subset having a corresponding feature space of the plurality of feature spaces (Page 4 Section 3.1 and Algorithm 2, “Our algorithm is aimed at finding that relevant subset of inputs… Obtain subspace R^D_t, D_t < D, using weight vector w_t… F_t = L (Projection of S into R^D_t)”, Page 4 Section 3.2, “As in standard boosting methods, we construct an additive model:

    PNG
    media_image2.png
    57
    518
    media_image2.png
    Greyscale

where zi = Pi(x) and Pi is a non-linear projection constructed using the weights of the instances given by the boosting algorithm. In this way, i-th k-NN classifier is constructed using the original instances projected using Pi and all of them equally weighted.”
García-Pedrajas teaches generating distinct projection subsets for each classifier in the ensemble, either by selecting a unique subspace (R^D_t) or by applying a unique nonlinear projection function (P_i) to the data.); 
determining… and for each plurality of projection subsets, a plurality of relative differences between each projection of the plurality of projection subsets (Page 4 Under Algorithm 3, “To evaluate an individual we apply a k-NN4 classifier using the projected training set S^P = {z_i, z_i = Px_i}. Then the weighted accuracy… of this classifier is obtained and this value is assigned as fitness to the individual.”
Applying a k-NN classifier requires calculating distances between data points in the projected feature space. Calculating a distance metric is the definition of determining a “relative difference” between two projections.);
 and providing… an output of the set of computer models by aggregating the plurality of projections of data points (Page 5 Under Algorithm 5, “The combination of the individual k-NN classifiers in both methods is made using a weighted voting approach. The weight of each vote is given by standard ADABOOST.”, Page 5 Algorithm 4
    PNG
    media_image3.png
    27
    437
    media_image3.png
    Greyscale

García-Pedrajas teaches providing the ensemble’s final output through aggregation, specifying a weighted voting approach. The aggregation step is performed on the outputs (votes) of each classifier, which are direct results of each classifier’s unique projection (R^D_t or P_i).).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Deegalla’s method of learning nearest neighbor ensembles using multiple feature subspaces and random projections with Garcia-Pedrajas’s use of classifier-specific input space projections and aggregations of distance-based outputs in order to enable each model to operate within a projected subset to improve classification accuracy on k-nearest neighbor algorithm (Introduction of Garcia-Pedrajas).

Regarding claim 8,
Deegalla teaches [a] non-transitory machine-readable storage medium comprising a computer- program product that includes instructions configured to cause a data processing apparatus to perform operations comprising (Page 4 Section 3.2.2, “For random feature subspace, random projection, nearest neighbor classification, decision trees (J48), random forest and support vector classifier (SMO), the WEKA data mining toolkit (version3.6.12) is used. WEKA generates pruned decision tree using reduced error pruning technique”, Page 2 Section 2.2, “In random projection, the original dataset is transformed into a lower dimensional subspace by using a random matrix”, Page 2 Section 2.3 of the Procedure, “Procedure:– For each base classifier from 1 to 𝑠 ∗ Select 𝑓 features randomly… transform the original dimension find the class labels…”, Page 3 Section 3.2.1, “To reduce the computational burden, we have considered ten linearly spaced features…”
Deegalla teaches this non-transitory storage medium and computer-program product by utilizing the WEKA data mining toolkit which contains a pre-defined set of instructions and software code that must be stored on a physical medium to enable the data processing apparatus to execute the specific algorithmic procedure.): 
The remainder of the claim recites similar limitations to claim 1. Therefore, claim 8 is rejected using the same rationale as claim 1.

Regarding claim 15,
Deegalla teaches [a] system, comprising: one or more data processors; and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations comprising (Page 5 Under Figure 3, “Table 6 shows the classification accuracies for the 34 datasets considered in the study based on experiment #2.”, Page 3 Section 3.2.1, “To reduce the computational burden, we have considered ten linearly spaced features…”, Page 4 Section 3.2.2, “For random feature subspace, random projection, nearest neighbor classification, decision trees (J48), random forest and support vector classifier (SMO), the WEKA data mining toolkit (version3.6.12) is used. WEKA generates pruned decision tree using reduced error pruning technique”, Page 2 Section 2.2, “In random projection, the original dataset is transformed into a lower dimensional subspace by using a random matrix”, Page 2 Section 2.3 of the Procedure, “Procedure:– For each base classifier from 1 to 𝑠 ∗ Select 𝑓 features randomly… transform the original dimension find the class labels…”
Deegalla defines a specific algorithmic procedure (the nearest neighbor ensemble) and utilizes a software toolkit (WEKA) which function as a set of instructions that must be stored on a medium and executed by processors to perform the described data transformations and classification operations.):
The remainder of the claim recites identical limitations to claim 1. Therefore, claim 15 is rejected using the same rationale as claim 1.


Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Deegalla (“Random subspace and random projection nearest neighbor ensembles for high dimensional data”, 2022) in view of García-Pedrajas (“Boosting k-nearest neighbor classifier by means of input space projection”, 2009) and Guttmann (US 20180336479 A1).

	Regarding claim 2,
Deegalla does not teach determining, by the computing device, a number of computer models to include in the set of computer models based on a number of available computational resources.
Guttmann, in the same field of endeavor, teaches determining, by the computing device, a number of computer models to include in the set of computer models based on a number of available computational resources (Paragraph 153 of Guttmann, “For example, the inference model may comprise an ensemble model… and the at least one rule may select the number of inference models in the ensemble and/or the types of the internal inference models according to the available processing resources information.”
A computing device uses rules to select the number of models in an ensemble based on available processing resources.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Deegalla’s method of learning nearest neighbor ensembles using random subspaces and random projections with Guttmann’s use of rules for selecting the number of models in an ensemble based on available processing resources in order to improve the practical deployability and efficiency of the ensemble on limited resource systems (Paragraph 153 of Guttmann).

As per claim 9, this claim is similar in scope to limitations recited in claim 2, and thus is rejected under the same rationale.
As per claim 16, this claim is similar in scope to limitations recited in claim 2, and thus is rejected under the same rationale.

Claims 3-4, 10-11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Deegalla (“Random subspace and random projection nearest neighbor ensembles for high dimensional data”, 2022) in view of García-Pedrajas (“Boosting k-nearest neighbor classifier by means of input space projection”, 2009) and Kida (US 20190065989 A1).

Regarding claim 3,
Deegalla does not teach teaches determining, by the computing device, a number of computer models to include in the set of computer models based on a size of the training dataset.
Kida, in the same field of endeavor, teaches determining, by the computing device, a number of computer models to include in the set of computer models based on a size of the training dataset (Paragraph 49, “…composition and size of training set with feature properties chosen to achieve 85% sensitivity… to train 50 different models… fifty models were trained with 1350 training samples sampled from a larger available data set…”
Kida links the decision to train a specific quantity of models (50) to the pre-calculated number of training samples (1350) required to hit a performance target of 85%.). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Deegalla’s method of learning nearest neighbor ensembles using random subspaces and random projections with	Kida’s use of selecting the number of trained models based on the size of the training dataset in order to determine an appropriate number of nearest neighbor models to use to reduce the complexity of the models while still producing high accuracy in the classifications (Paragraph 18 of Kida).

	Regarding claim 4,
Deegalla does not teach determining the number of computer models to include in the set of computer models includes: determining an optimal size for each training data subset of the plurality of training data subsets, and determining the number of computer models to include in the set of computer models to correspond to the optimal size for each training data subset of the plurality of training data subsets.
Kida, in the same field of endeavor, teaches determining the number of computer models to include in the set of computer models includes: …and determining the number of computer models to include in the set of computer models to correspond to the optimal size for each training data subset of the plurality of training data subsets (Paragraph 81, “determines a selected model with a smallest number of samples in the selected class training set where other models with larger select class training sets have a sensitivity that is not greater than some threshold, wherein the size of the finalize selected class training set is the size of the selected model.”, Paragraph 71, “…provide a final training set that includes samples from each of the plurality of classes that includes for each class a number of samples based upon the size of the final selected class training set for each class.”
Kida teaches that the number of models trained is selected in view of the available and selected training data size because each model is trained on a subset of the dataset sampled to meet a predefined sensitivity target. The computing device determines the number of models to include based on the size of the training dataset available for model training.).
 determining an optimal size for each training data subset of the plurality of training data subsets (Paragraph 40, “A knee 718, however, may be seen where adding additional samples does not significantly increase the sensitivity… the knee 718 corresponds to a training data set with 3,000 working class samples.”
Kida configures the final set of models and their associated training sets to specifically match the optimized sample count, which is the knee.);
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Deegalla’s method of learning nearest neighbor ensembles using random subspaces and random projections with	Kida’s determination of an optimal training subset size based on the number of trained models in order to reduce model complexity and improve the efficiency of the models (Paragraph 18 of Kida).

As per claims 10-11, these claims are similar in scope to limitations recited in claims 3-4, and thus is rejected under the same rationale.
As per claim 17, this claim is similar in scope to limitations recited in claims 3-4, and thus is rejected under the same rationale.


Claims 5-7, 12-14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Deegalla (“Random subspace and random projection nearest neighbor ensembles for high dimensional data”, 2022) in view of García-Pedrajas (“Boosting k-nearest neighbor classifier by means of input space projection”, 2009) and Li (“Distance Weighted Cosine Similarity Measure for Text Classification”, 2013).

Regarding claim 5,
Deegalla teaches the set of computer models is a set of nearest neighbor models, and wherein providing the output of the set of nearest neighbor models includes (Page 2 Section 2.3, “Input:– the dataset– the ensemble size 𝑠– the number of features in each subset 𝑓– number of nearest neighbors 𝑘 • Procedure:– For each base classifier from 1 to 𝑠… ∗ According to the Euclidean distance, find the class labels of 𝑘 closest instances and predict the final class using majority voting– Using class labels among 𝑠 base classifiers, find the final class of the ensemble using majority voting • Output: Nearest neighbor ensemble”):
a different projection of the plurality of projections (Page 2 Section 2.3, “For each base classifier from 1 to s Select 𝑓 features randomly from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training”)
Deegalla does not teach determining a set of weights, each weight of the set of weights corresponding to… data points; and applying the set of weights to the… data points to resolve relative differences included….
 Li, in the same field of endeavor, teaches determining a set of weights, each weight of the set of weights corresponding to… data points (“Each document is then represented as a space vector where the words in the document are mapped onto the corresponding coordinates… The weight of a feature is given as follows:

    PNG
    media_image4.png
    138
    517
    media_image4.png
    Greyscale
 which is the same as the standard representation “ltc”…”
Li defines a formula to calculate a numerical weight for every individual word/feature (data point) within the document vectors); 
and applying the set of weights to the… data points to resolve relative differences included... (Page 5 Experimental studies, “With k-Nearest Neighbor algorithm, the category prediction of a test sample is made according to the category distribution among the top k most similar samples in the training set, where a similarity metric is used to find these k Nearest Neighbors. We evaluate different similarity metrics, including our proposed distance weighted cosine similarity,”, Page 3 Weighted Hamming Distance, “Weighted Hamming Distance: the former measure takes each feature equally important, which is not ideal. A simple improvement is to weight counts with features’ values…”, Page 2 Introduction, “It can thus be derived from the above figure that cosine similarity tends to be overly biased by the features of higher values, but it doesn’t care much about how many features two vectors share.”
Li teaches calculating important weights for each feature and applying those weights within a weighted similarity metric to ensure the k-Nearest Neighbor algorithm identifies neighbors based on shared significance and not by misled data values.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Deegalla’s method of learning nearest neighbor ensembles using random subspaces and random projections with Li’s use of feature weighting applied within similarity metrics in order to reduce the influence of less relevant feature differences between data points to enhance the prediction capability of each model (Sections 2 and 3, and Experimental Studies of Li).

Regarding claim 6,
Deegalla teaches the one or more indications of similarity between the projections of data points included in a common training subset of the plurality of training subsets includes… between the projections of data points included in a common training subset of the plurality of training subsets (Page 2 Section 2.3, “For each base classifier from 1 to s Select 𝑓 features randomly from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training ∗ Select the same features (or transform the original dimension using the same random matrix) in the testing ∗ According to the Euclidean distance, find the class labels of 𝑘 closest instances and predict the final class using majority voting.”
For each base classifier, Deegalla’s procedure calculates the Euclidean distance between projected/selected data points within that classifier’s own training subset. This Euclidean distance is the “indication of similarity” performed on a training subset.).
Deegalla does not teach one or more cosine similarities, one or more Minkowski distances, or one or more Jaccard similarities.
Li, in the same field of endeavor, teaches one or more cosine similarities, one or more Minkowski distances, or one or more Jaccard similarities (Page 5 Section 3.2 of Li, “With k-Nearest Neighbor algorithm, the category prediction of a test sample is made according to the category distribution among the top k most similar samples in the training set, where a similarity metric is used to find these k Nearest Neighbors. We evaluate different similarity metrics, including our proposed distance weighted cosine similarity, the original cosine similarity, Jaccard, and others.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Deegalla’s method of learning nearest neighbor ensembles using random subspaces and random projections with Li’s use of similarity metrics including Cosine and Jaccard similarities between data points in order to improve the enhance the ensemble’s learning and prediction capability (Page 5, Section 3.2 of Li).

Regarding claim 7,
Deegalla teaches determining the plurality of relative differences between each projection of the plurality of projection subsets includes… between each projection included in a respective projection subset of the plurality of projection subsets (Page 2 Section 2.3, “For each base classifier from 1 to s Select 𝑓 features randomly from the original dimension (or transform the original dimension into 𝑓 new components using random projection) in the training ∗ Select the same features (or transform the original dimension using the same random matrix) in the testing ∗ According to the Euclidean distance, find the class labels of 𝑘 closest instances and predict the final class using majority voting.”
When classifying a test point, Deegalla projects it into a classifier’s specific feature space, creating a new projection. The method then determines the Euclidean distance between this test projection and the projections of the training points that belong to that classifier’s respective subset.)
Deegalla does not teach determining a cosine similarity, a Minkowski distance, or a Jaccard similarity….
Li, in the same field of endeavor, teaches determining a cosine similarity, a Minkowski distance, or a Jaccard similarity… (Page 5 Section 3.2 of Li, “With k-Nearest Neighbor algorithm, the category prediction of a test sample is made according to the category distribution among the top k most similar samples in the training set, where a similarity metric is used to find these k Nearest Neighbors. We evaluate different similarity metrics, including our proposed distance weighted cosine similarity, the original cosine similarity, Jaccard, and others.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Deegalla’s method of learning nearest neighbor ensembles using random subspaces and random projections with Li’s use of similarity metrics including Cosine and Jaccard similarities between data points in order to improve the enhance the ensemble’s learning and prediction capability (Page 5, Section 3.2 of Li).

As per claims 12-14 and 18-20, these claims are similar in scope to limitations recited in claim 5-7, and thus is rejected under the same rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAJD MAHER HADDAD whose telephone number is (571)272-2265. The examiner can normally be reached Mon-Friday 8-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.M.H./Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
ENHANCING NEAREST NEIGHBOR ALGORITHM USING A SET OF PARALLEL MODELS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

ENHANCING NEAREST NEIGHBOR ALGORITHM USING A SET OF PARALLEL MODELS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email