Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on February 18, 2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 11 is objected to over the following minor informalities: “the first query” should read “wherein the first query.” Appropriate correction is
Claim 13 is objected to over the following minor informalities: “and wherein the initial training” should read “wherein the initial training.” Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-14 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Cunningham (PG Pub. No. 2002/0129038 A1), and further in view of Wang (PG Pub. No. 2014/0143251 A1).
Regarding Claim 1, Cunningham discloses a database system comprising:
at least one processor (see Cunningham, paragraph [0174], here any type of computer could be used to implement the present invention), and
at least one memory that stores operations instructions (see Cunningham, paragraph [0033], where the data servers 110A-110E, OLAP Client 114, Analysis Client 116, Analysis Interface 118, OLAP Server 120, Analysis Server 122, Learning Engine 124, Inference Engine 126, Data Mining View 128, Model Results Table 130, and/or RDBMS 132 each comprise logic and/or data tangibly embodied in and/or accessible from a device, media, carrier, or signal, such as RAM, ROM, one or more of the data storage devices) that, when executed by the at least one processor, cause the database system to:
determine a first query that indicates a first request to generate a Gaussian mixture model (see Cunningham, Claim 1, where the method for creating analyzing data in a computer-implemented data mining system comprises (a) accessing data from a database in a computer-implemented data mining system; and (b) performing an Expectation-Maximization (EM) algorithm in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data);
execute the first query to generate Gaussian mixture model data for the Gaussian mixture model (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data) based on:
generating a training set of rows based on accessing a plurality of rows of a relational database table of a relational database (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data);
performing an initial training function upon the training set of rows to group the training set of rows into an initial set of clusters (see Cunningham, paragraph [0094], where a comparable solution is to compute the k-means model as an initialization of the full Gaussian Mixture Model);
generating initial cluster parameter data indicating, for each cluster of the initial set of clusters, a set of initial cluster parameters characterizing the each cluster (see Cunningham, paragraphs [0047], [0048], where the goal of the EM algorithm is to estimate the means (C), the covariances (R), and the mixture weights (W) of the Gaussian mixture probability function described in the previous subsection; this algorithm starts from an approximation to the solution; see also paragraph [0057], where block 204 represents setting of initial values for C, R, and W); and
performing an iterative process to generate final cluster parameter data indicating, for each cluster of a final set of clusters, a set of final cluster parameters characterizing the each cluster by updating the initial cluster parameter data for the each cluster (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
determine a second query that indicates a second request to apply the Gaussian mixture model to input data (see Cunningham, paragraph [0102], where model selection involves deciding which of the various possible Gaussian Mixture Models are suitable for use with a given data set … the present invention eases this requirements with a set of pragmatic choices in model selection; see also paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database); and
execute the second query to generate model output of the Gaussian mixture model for the input data (see Cunningham, paragraph [0102], where model selection involves deciding which of the various possible Gaussian Mixture Models are suitable for use with a given data set … the present invention eases this requirements with a set of pragmatic choices in model selection; see also paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database),
Cunningham does not explicitly disclose based on, for each row in the input data, identifying a classification label for an identified one of the final set of clusters that includes the each row based on the final cluster parameter data. Wang discloses for each row in the input data, identifying a classification label for an identified one of the final set of clusters that includes the each row based on the final cluster parameter data (see Wang, paragraph [0060], where Algorithm 2 includes … initialize k centroids and … allocate a cluster label to each data point vi by selecting the nearest centroid).
Cunningham and Wang are both directed toward clustering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cunningham with Wang as it amounts to combining prior art elements according to known techniques to yield predictable results.
Regarding Claim 2, Cunningham in view of Wang discloses the database system of Claim 1, wherein the initial cluster parameter data includes, for each cluster, an initial mixture weight for the each cluster, an initial mean weight for the each cluster, and an initial covariance matrix for the each cluster (see Cunningham, paragraphs [0047], [0048], where the goal of the EM algorithm is to estimate the means (C), the covariances (R), and the mixture weights (W) of the Gaussian mixture probability function described in the previous subsection; this algorithm starts from an approximation to the solution; see also paragraph [0057], where block 204 represents setting of initial values for C, R, and W), wherein the final cluster parameter data includes, for each cluster: a final mixture weight for the each cluster generated based on updating the initial mixture weight for the each cluster via the iterative process, a final mean for the each cluster generated based on updating the initial mean for the each cluster via the iterative process, and a final covariance matrix for the each cluster generated based on updating the initial covariance matrix for the each cluster via the iterative process (see Cunningham, paragraph [0056], where upon completion of the loop, control transfers to Block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the highest log-likelihood).
Regarding Claim 3, Cunningham in view of Wang discloses the database system of Claim 2, wherein generating the initial cluster parameter data includes: computing the initial mixture weight for the each cluster as a proportion of rows of the training set of rows grouped in the each cluster via performance of the initial training function (see Cunningham, paragraph [0045], where W is described as cluster weights; see also paragraph [0057], where block 204 represents the setting of initial values for C, R, and W), wherein a set of initial mixture weights corresponding to the initial set of clusters sum to one, and wherein a set of final mixture weights corresponding to the final set of clusters sum to one (see Cunningham, paragraph [0053], where W == 1, that is the sum of the weights across all clusters equals one).
Regarding Claim 4, Cunningham in view of Wang discloses the database system of Claim 2, wherein generating the initial cluster parameter data includes: computing the initial mean for the each cluster as a mean computed for rows of the training set of rows grouped in the each cluster via performance of the initial training function (see Cunningham, paragraph [0045], where C is described as k-cluster centroids; see also paragraph [0057], where block 204 represents setting of initial values of C, R, and W), wherein the initial mean is defined as an ordered set of mean values corresponding to an ordered set of columns of the training set of rows, wherein each mean value of the ordered set of mean values is computed as a mean column value for a corresponding column of the ordered set of columns (see Cunningham, paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database [it is the position of the Examiner that generating k-cluster centroids on relational database data, being comprised of an ordered set of columns, is not patentably distinguishable from the mean being defined as an ordered set of mean values corresponding to an ordered set of columns of the training set of rows, wherein each mean value of the ordered set of mean values is computed as a mean column value for a corresponding column of the ordered set of columns]).
Regarding Claim 5, Cunningham in view of Wang discloses the database system of Claim 2, wherein generating the initial cluster parameter data includes: computing the initial covariance matrix for the each cluster from rows of the training set of rows grouped in the each cluster via performance of the initial training function (see Cunningham, paragraph [0045], where R is described as covariances; see also paragraph [0057], where block 204 represents the setting of initial values for C, R, and W).
Regarding Claim 6, Cunningham in view of Wang discloses the database system of Claim 2, wherein:
performing each iteration of a plurality of iterations of the iterative process includes performing a first step and a second step (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein a given iteration of the plurality of iterations is performed immediately after a prior iteration of the plurality of iterations and immediately before a subsequent iteration of the plurality of iterations (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein performing the first step for the given iteration of the plurality of iterations includes computing, for each row in the training set of rows, a corresponding set of current membership values corresponding to a current set of clusters updated from the initial set of clusters via prior iterations of the plurality of iterations, wherein each current membership value of the corresponding set of current membership values is generated for a corresponding cluster of the current set of clusters for the each row as a function of a previous mixture weight for the corresponding cluster, a previous mean of the corresponding cluster, and a previous covariance matrix for the corresponding cluster, wherein the previous mixture weight for the corresponding cluster, the previous mean of the corresponding cluster, and the previous covariance matrix for the corresponding cluster were generated in the prior iteration (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein performing the second step for the given iteration of the plurality of iterations includes updating each cluster of the current set of clusters to render an updated set of clusters each having an updated mixture weight, an updated mean, and an updated covariance matrix (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters) based on:
generating the updated mixture weight for the each cluster as a mean of a plurality of current membership values computed for the each cluster across all rows in the training set of rows in performing the first step (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters); and
generating the updated mean and the updated covariance matrix for the each cluster from the training set of rows based on each of the training set of rows being weighted utilizing the current membership value for the each row and the each cluster (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the subsequent iteration is performed based on processing the updated mixture weight for the each cluster, the updated mean for the each cluster, and the updated covariance matrix for the each cluster (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters).
Regarding Claim 7, Cunningham in view of Wang discloses the database system of Claim 2, wherein:
the updated mixture weight generated in the given iteration for the each cluster is utilized as a corresponding previous mixture weight for the each cluster in performing the first step in the subsequent iteration (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the updated mean generated in the given iteration for the each cluster is utilized as a corresponding previous mean for the each cluster in performing the first step in the subsequent iteration (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the updated covariance matrix generated in the given iteration for the each cluster is utilized as a corresponding previous covariance matrix for the each cluster in performing the first step in the subsequent iteration (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the initial mixture weight for the each cluster is utilized as a first previous mixture weight for the each cluster in performing a first iteration of the plurality of iterations (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the initial mean for the each cluster is utilized as a first previous mean for the each cluster in performing the first iteration of the plurality of iterations (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the initial covariance matrix for the each cluster is utilized as a first covariance matrix for the each cluster in performing the first iteration of the plurality of iterations (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the final mixture weight for the each cluster is set as the updated mixture weight generated for the each cluster in performing the second step in a final iteration of the plurality of iterations (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
wherein the final mean for the each cluster is set as the updated mean generated for the each cluster in performing the second step in the final iteration of the plurality of iterations (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters); and
wherein the final covariance matrix for the each cluster is set as the updated covariance matrix generated for the each cluster in performing the second step in the final iteration of the plurality of iterations (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters).
Regarding Claim 8, Cunningham in view of Wang discloses the database system of Claim 2, wherein identifying the classification label for each row in the input data in generating the model output by executing the second query is based on:
computing, for the each row in the input data, a corresponding set of membership values corresponding to the final set of clusters (see Cunningham, paragraph [0056], where upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W which are matrices containing the updated mixture parameters with the highest log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters (the X matrix is helpful in classifying the data according to the clusters), wherein each membership value of the corresponding set of membership values is generated for a corresponding cluster of the set of clusters for the each row in the input data as a function of the final mixture weight for the corresponding cluster, the final mean of the corresponding cluster, and the final covariance matrix for the corresponding cluster (see Cunningham, paragraph [0056], where upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W which are matrices containing the updated mixture parameters with the highest log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters (the X matrix is helpful in classifying the data according to the clusters).
Cunningham does not explicitly disclose wherein the classification label identified for the each row in the input data corresponds to one of the final set of clusters for which the each row in the input data has a highest valued membership value of the corresponding set of membership values. Wang discloses (see Wang, paragraph [0060], where Algorithm 2 includes … initialize k centroids and … allocate a cluster label to each data point vi by selecting the nearest centroid).
Cunningham and Wang are both directed toward clustering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cunningham with Wang as it amounts to combining prior art elements according to known techniques to yield predictable results.
Regarding Claim 9, Cunningham in view of Wang discloses the database system of Claim 2, wherein the initial covariance matrix and the final covariance matrix are implemented via generation of an object having a matrix data type, and wherein the matrix data type is implemented as a first class data type (see Cunningham, paragraph [0045], where Table 2 explicitly lists covariances as matrix R).
Regarding Claim 10, Cunningham in view of Wang discloses the database system of Claim 1, wherein the operations instructions, when executed by the at least one processor, further cause the database system to:
generate a query operator execution flow for the first query that includes a first subset of operators that include at least one relational operator (see Cunningham, paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database) and a second subset of operators that include at least one non-relational linear algebra operator (see Cunningham, paragraph [0041], where formulas are the basic ingredient to implementing EM in SQL);
wherein executing the first query includes executing the query operator execution flow for the first query based on executing the first subset of operators (see Cunningham, paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database) and executing the second subset of operators (see Cunningham, paragraph [0041], where formulas are the basic ingredient to implementing EM in SQL), wherein the training set of rows is generated based on accessing the plurality of rows of the relational database table of the relational database by executing first subset of operators (see Cunningham, paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database, and wherein the initial training function is performed upon the training set of rows, the initial cluster parameter data is generated, and the iterative process implementing is performed by executing the second subset of operators (see Cunningham, paragraph [0041], where formulas are the basic ingredient to implementing EM in SQL).
Regarding Claim 11, Cunningham in view of Wang discloses the database system of Claim 1, the first query is determined based on a first query expression that includes a call to a Gaussian mixture model training function indicating a configured number of clusters, wherein the initial set of clusters includes exactly the configured number of clusters, and wherein the final set of clusters includes exactly the configured number of clusters (see Cunningham, paragraph [0055], where block 200 represents the input of several variables, including (1) k, which is the number of clusters).
Regarding Claim 12, Cunningham in view of Wang discloses the database system of Claim 1, wherein the first query is determined based on a first query expression that includes a call to a Gaussian mixture model training function selecting a name for the Gaussian mixture model, and wherein the second query is determined based on a second query expression that includes a call to the Gaussian mixture model by indicating the name for the Gaussian mixture model (see Cunningham, paragraph [0123] – [0125], where import/export standard format for text file with C, R, W and their flags; problem; model parameters must be input and output in standard formats; this ensures that the results may be saved and reused; solution: block 204 in Fig. 2A creates a standard output for the Gaussian Mixture Model, which can be easily exported to other programs for viewing, analysis, or editing).
Regarding Claim 13, Cunningham in view of Wang discloses the database system of Claim 1, wherein the initial training function is configured to train models of a corresponding model type having a non-mixture model type, wherein the initial training function is performed to generate non-mixture model data of the non-mixture model type indicating grouping of the training set of rows into the initial set of clusters, and wherein the initial cluster parameter data is generated based on the non-mixture model data (see Cunningham, paragraph [0094], where a comparable solution is to compute the k-means model as an initialization of the full Gaussian Mixture Model).
Regarding Claim 14, Cunningham in view of Wang discloses the database system of Claim 13, wherein the non-mixture model type is a K means model type, wherein the non-mixture model data corresponds to K means model data generated by performing a K means model training process, wherein the K means model indicates a set of K means centroids, and wherein the mixture model data is implemented as non-K means model data (see Cunningham, paragraph [0094], where a comparable solution is to compute the k-means model as an initialization of the full Gaussian Mixture Model).
Regarding Claim 17, Cunningham in view of Wang discloses the database system of Claim 1, wherein the iterative process implements an expectation-maximization algorithm (see Cunningham, paragraph [0014], where the Expectation-Maximization (EM) algorithm is a well-established algorithm to cluster data).
Regarding Claim 18, Cunningham discloses a method, comprising:
determine a first query that indicates a first request to generate a Gaussian mixture model (see Cunningham, Claim 1, where the method for creating analyzing data in a computer-implemented data mining system comprises (a) accessing data from a database in a computer-implemented data mining system; and (b) performing an Expectation-Maximization (EM) algorithm in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data);
execute the first query to generate Gaussian mixture model data for the Gaussian mixture model (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data) based on:
generating a training set of rows based on accessing a plurality of rows of a relational database table of a relational database (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data);
performing an initial training function upon the training set of rows to group the training set of rows into an initial set of clusters (see Cunningham, paragraph [0094], where a comparable solution is to compute the k-means model as an initialization of the full Gaussian Mixture Model);
generating initial cluster parameter data indicating, for each cluster of the initial set of clusters, a set of initial cluster parameters characterizing the each cluster (see Cunningham, paragraphs [0047], [0048], where the goal of the EM algorithm is to estimate the means (C), the covariances (R), and the mixture weights (W) of the Gaussian mixture probability function described in the previous subsection; this algorithm starts from an approximation to the solution; see also paragraph [0057], where block 204 represents setting of initial values for C, R, and W); and
performing an iterative process to generate final cluster parameter data indicating, for each cluster of a final set of clusters, a set of final cluster parameters characterizing the each cluster by updating the initial cluster parameter data for the each cluster (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
determine a second query that indicates a second request to apply the Gaussian mixture model to input data (see Cunningham, paragraph [0102], where model selection involves deciding which of the various possible Gaussian Mixture Models are suitable for use with a given data set … the present invention eases this requirements with a set of pragmatic choices in model selection; see also paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database); and
execute the second query to generate model output of the Gaussian mixture model for the input data (see Cunningham, paragraph [0102], where model selection involves deciding which of the various possible Gaussian Mixture Models are suitable for use with a given data set … the present invention eases this requirements with a set of pragmatic choices in model selection; see also paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database),
Cunningham does not explicitly disclose based on, for each row in the input data, identifying a classification label for an identified one of the final set of clusters that includes the each row based on the final cluster parameter data. Wang discloses for each row in the input data, identifying a classification label for an identified one of the final set of clusters that includes the each row based on the final cluster parameter data (see Wang, paragraph [0060], where Algorithm 2 includes … initialize k centroids and … allocate a cluster label to each data point vi by selecting the nearest centroid).
Cunningham and Wang are both directed toward clustering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cunningham with Wang as it amounts to combining prior art elements according to known techniques to yield predictable results.
Regarding Claim 19, Cunningham in view of Wang discloses the method of Claim 18, wherein the initial cluster parameter data includes, for each cluster, an initial mixture weight for the each cluster, an initial mean weight for the each cluster, and an initial covariance matrix for the each cluster (see Cunningham, paragraphs [0047], [0048], where the goal of the EM algorithm is to estimate the means (C), the covariances (R), and the mixture weights (W) of the Gaussian mixture probability function described in the previous subsection; this algorithm starts from an approximation to the solution; see also paragraph [0057], where block 204 represents setting of initial values for C, R, and W), wherein the final cluster parameter data includes, for each cluster: a final mixture weight for the each cluster generated based on updating the initial mixture weight for the each cluster via the iterative process, a final mean for the each cluster generated based on updating the initial mean for the each cluster via the iterative process, and a final covariance matrix for the each cluster generated based on updating the initial covariance matrix for the each cluster via the iterative process (see Cunningham, paragraph [0056], where upon completion of the loop, control transfers to Block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the highest log-likelihood).
Regarding Claim 20, Cunningham discloses a non-transitory computer readable storage medium comprises:
at least one memory section that stores operational instructions that, when executed by the at least one processing module that includes a processor and a memory (see Cunningham, paragraph [0033], where the data servers 110A-110E, OLAP Client 114, Analysis Client 116, Analysis Interface 118, OLAP Server 120, Analysis Server 122, Learning Engine 124, Inference Engine 126, Data Mining View 128, Model Results Table 130, and/or RDBMS 132 each comprise logic and/or data tangibly embodied in and/or accessible from a device, media, carrier, or signal, such as RAM, ROM, one or more of the data storage devices), cause the at least one processing module to:
determine a first query that indicates a first request to generate a Gaussian mixture model (see Cunningham, Claim 1, where the method for creating analyzing data in a computer-implemented data mining system comprises (a) accessing data from a database in a computer-implemented data mining system; and (b) performing an Expectation-Maximization (EM) algorithm in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data);
execute the first query to generate Gaussian mixture model data for the Gaussian mixture model (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data) based on:
generating a training set of rows based on accessing a plurality of rows of a relational database table of a relational database (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data);
performing an initial training function upon the training set of rows to group the training set of rows into an initial set of clusters (see Cunningham, paragraph [0094], where a comparable solution is to compute the k-means model as an initialization of the full Gaussian Mixture Model);
generating initial cluster parameter data indicating, for each cluster of the initial set of clusters, a set of initial cluster parameters characterizing the each cluster (see Cunningham, paragraphs [0047], [0048], where the goal of the EM algorithm is to estimate the means (C), the covariances (R), and the mixture weights (W) of the Gaussian mixture probability function described in the previous subsection; this algorithm starts from an approximation to the solution; see also paragraph [0057], where block 204 represents setting of initial values for C, R, and W); and
performing an iterative process to generate final cluster parameter data indicating, for each cluster of a final set of clusters, a set of final cluster parameters characterizing the each cluster by updating the initial cluster parameter data for the each cluster (see Cunningham, paragraph [0056], where block 202 is a decision block that represents a WHILE loop, which is performed while the change in log-likelihood llh is greater than E; for every iteration of the loop, control transfers to block 204; upon completion of the loop, control transfers to block 206 that produces the output, including (1) C, R, W, which are matrices containing the updated mixture parameters with the initial log-likelihood, and (2) X, which is a matrix storing the probabilities for each point belonging to each of the clusters);
determine a second query that indicates a second request to apply the Gaussian mixture model to input data (see Cunningham, paragraph [0102], where model selection involves deciding which of the various possible Gaussian Mixture Models are suitable for use with a given data set … the present invention eases this requirements with a set of pragmatic choices in model selection; see also paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database); and
execute the second query to generate model output of the Gaussian mixture model for the input data (see Cunningham, paragraph [0102], where model selection involves deciding which of the various possible Gaussian Mixture Models are suitable for use with a given data set … the present invention eases this requirements with a set of pragmatic choices in model selection; see also paragraph [0031], where server tier 106 comprises a Database Tier for storing and managing the databases; wherein the Database Tier includes an Inference Engine 126 that performs an Inference step of the data mining algorithm, a relational database management system (RBDMS) 132 that performs the SQL statements against a Data Mining View 128 to retrieve the data from the database),
Cunningham does not explicitly disclose based on, for each row in the input data, identifying a classification label for an identified one of the final set of clusters that includes the each row based on the final cluster parameter data. Wang discloses for each row in the input data, identifying a classification label for an identified one of the final set of clusters that includes the each row based on the final cluster parameter data (see Wang, paragraph [0060], where Algorithm 2 includes … initialize k centroids and … allocate a cluster label to each data point vi by selecting the nearest centroid).
Cunningham and Wang are both directed toward clustering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cunningham with Wang as it amounts to combining prior art elements according to known techniques to yield predictable results.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Cunningham and Wang as applied to Claims 1-14 and 17-20 above, and further in view of Alexe (PG Pub. No. 2008/0313135 A1).
Regarding Claim 15, Cunningham in view of Wang discloses the database system of Claim 14, wherein performing the K means model training process includes:
Cunningham does not disclose:
generating a plurality of training subsets from the training set of rows;
processing the plurality of training subsets via a corresponding plurality of parallelized processes to generate a plurality of sets of centroids corresponding to a plurality of different K means models based on performing a K means training operation via each of the corresponding plurality of parallelized processes upon a corresponding one of the plurality of training subsets; and
generating a final set of centroids corresponding to a final K means model based on performing the K means training operation upon the plurality of sets of centroids, wherein the non-mixture model data indicates the final set of centroids as the set of K means centroids.
Wang discloses:
generating a plurality of training subsets from the training set of rows (see Wang, paragraph [0027], where we perform an initial segmentation of data points).
Cunningham and Wang are both directed toward clustering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cunningham and Wang as it amounts to combining prior art elements according to known techniques to yield predictable results.
Cunningham in view of Wang does not disclose:
processing the plurality of training subsets via a corresponding plurality of parallelized processes to generate a plurality of sets of centroids corresponding to a plurality of different K means models based on performing a K means training operation via each of the corresponding plurality of parallelized processes upon a corresponding one of the plurality of training subsets; and
generating a final set of centroids corresponding to a final K means model based on performing the K means training operation upon the plurality of sets of centroids, wherein the non-mixture model data indicates the final set of centroids as the set of K means centroids.
Cunningham in view of Wang and Alexe discloses:
processing the plurality of training subsets via a corresponding plurality of parallelized processes to generate a plurality of sets of centroids (see Wang, paragraph [0027], where we perform … a series of discrete distribution (D2) clustering operations and determine the local centroids of each segment; see also paragraph [0028], where the D2 clustering operations may be performed by parallel processors) corresponding to a plurality of different K means models based on performing a K means training operation via each of the corresponding plurality of parallelized processes upon a corresponding one of the plurality of training subsets (see Alexe, paragraph [0037], where the consensus ensemble clustering may comprise: (1) generating a collection of clustering solutions using different methods applied to many perturbations of the data; and (2) implementing a consensus function for combining the clusters found to produce a single output clustering of the data); and
generating a final set of centroids corresponding to a final K means model based on performing the K means training operation upon the plurality of sets of centroids, wherein the non-mixture model data indicates the final set of centroids as the set of K means centroids (see Wang, paragraph [0025], where we parallelize the centroid update in D2-clustering by a similar strategy to K-means' parallelization: dividing the data into segments based on their adjacency, computing some local centroids for each segment in parallel, and combining the local centroids to a global centroid).
Cunningham, Wang, and Alexe are all directed toward clustering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cunningham and Wang with Alexe as it amounts to combining prior art elements according to known techniques to yield predictable results.
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Cunningham and Wang as applied to Claims 1-14 and 17-20 above, and further in view of Masuzaki (PG Pub. No. 2021/0247751 A1).
Regarding Claim 16, Cunningham in view of Wang discloses the database system of Claim 13, wherein:
a function library includes a Gaussian mixture model training function (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data) and a K means model training function (see Cunningham, paragraph [0094], where a comparable solution is to compute the k-means model as an initialization of the full Gaussian Mixture Model), wherein the Gaussian mixture model training function is performed via a first function call to the Gaussian mixture model training function (see Cunningham, paragraph [0017], where the invention is directed to a computer-implemented data mining system that analyzes data using Gaussian Mixture Models; the data is accessed from a database, and then an Expectation-Maximization (EM) algorithm is performed in the computer-implemented data mining system to create the Gaussian Mixture Model for the accessed data), wherein performing the Gaussian mixture model training function includes performing the K means model training function via a second function call to the K means model training function (see Cunningham, paragraph [0094], where a comparable solution is to compute the k-means model as an initialization of the full Gaussian Mixture Model), and wherein the operations instructions, when executed by the at least one processor, further cause the database system to:
Cunningham does not disclose:
execute the third query to generate corresponding K means model data for the K means model based on executing the K means model training function, wherein the Gaussian mixture model training function is not executed when executing the third query based on the third query not indicating a request to generate a corresponding Gaussian mixture model;
determine a fourth query that indicates a fourth request to apply the K means model to second input data; and
execute the fourth query to generate model output of the K means model for the second input data.
Masuzaki discloses:
execute the third query to generate corresponding K means model data for the K means model based on executing the K means model training function, wherein the Gaussian mixture model training function is not executed when executing the third query based on the third query not indicating a request to generate a corresponding Gaussian mixture model (see Masuzaki, paragraph [0063], where a freely-selected clustering method may be used, for example, a k-means method or a Gaussian mixture model (GMM) may be employed [it is the position of the Examiner that selecting between k-means and Gaussian mixture model contemplates selecting K-means and not Gaussian mixture model]);
determine a fourth query that indicates a fourth request to apply the K means model to second input data (see Masuzaki, paragraph [0063], where a freely-selected clustering method may be used, for example, a k-means method or a Gaussian mixture model (GMM) may be employed [it is the position of the Examiner that selecting between k-means and Gaussian mixture model contemplates selecting K-means and not Gaussian mixture model]); and
execute the fourth query to generate model output of the K means model for the second input data (see Masuzaki, paragraph [0063], where a freely-selected clustering method may be used, for example, a k-means method or a Gaussian mixture model (GMM) may be employed [it is the position of the Examiner that selecting between k-means and Gaussian mixture model contemplates selecting K-means and not Gaussian mixture model]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cunningham with Masuzaki for the benefit of allowing selection of the clustering method (see Masuzaki, paragraph [0063]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARHAD AGHARAHIMI whose telephone number is (571)272-9864. The examiner can normally be reached M-F 9am - 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached at 571-272-4080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FARHAD AGHARAHIMI/Examiner, Art Unit 2161
/APU M MOFIZ/Supervisory Patent Examiner, Art Unit 2161