Office Action Analysis: 18358504 — OPTIMALLY DIVIDING DATASET DISTRIBUTIONS

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The following NON-FINAL Office Action is in response to application 18/358,504 filed on 09/19/2023. This communication is the first action on the merits.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/25/2023 has been
considered by the examiner.
Drawings
The drawings were received on 07/25/2023. These drawings are acceptable.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more. A subject matter eligibility analysis is set forth below. See MPEP 2106. 
Specifically, representative Claim 1 recites:
A computer-implemented method comprising:
receiving, by one or more processing devices, a dataset;
generating, by the one or more processing devices, a histogram distribution of the dataset;
identifying, by the one or more processing devices, an elbow/knee point by iteratively analyzing the histogram distribution based on y-axis values;
determining, by the one or more processing devices, histogram bin significance of the histogram distribution using a central tendency value;
determining, by the one or more processing devices, a most extreme difference histogram bin of the histogram distribution based on the histogram distribution, the elbow/knee point, and the histogram bin significance;
mapping, by the one or more processing devices, the most extreme difference histogram bin to the dataset;
splitting, by the one or more processing devices, the dataset into a head dataset and a tail dataset at the most extreme difference histogram bin; and
outputting, by the one or more processing devices, an indication of the head dataset and the tail dataset.
The claim limitations in the abstract idea have been highlighted in bold above; the remaining limitations are “additional elements.”
Claim 11 includes limitations corresponding to the same abstract idea recited in Claim 1, expressed as a computer-readable storage medium. 
System Claim 16 also includes limitations corresponding to the same abstract idea recited in Claim 1 and comprises: 
a processor set, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media.

Under Step 1 of the analysis, claim 1 belongs to a statutory category, namely it is a method claim.  Likewise, claim 11 is a computer-readable storage medium claim and claim 16 is a system claim. 
Under Step 2A, prong 1: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. 
In the instant case, claim 1 is found to recite at least one judicial exception (i.e.  abstract idea), that being a Mental Process and a Mathematical Concept.  This can be seen in the claim limitations of “generating, by the one or more processing devices, a histogram distribution of the dataset”, “identifying, by the one or more processing devices, an elbow/knee point by iteratively analyzing the histogram distribution based on y-axis values”, “determining, by the one or more processing devices, histogram bin significance of the histogram distribution using a central tendency value”, “determining, by the one or more processing devices, a most extreme difference histogram bin of the histogram distribution based on the histogram distribution, the elbow/knee point, and the histogram bin significance”, “mapping, by the one or more processing devices, the most extreme difference histogram bin to the dataset”, and “splitting, by the one or more processing devices, the dataset into a head dataset and a tail dataset at the most extreme difference histogram bin” which is the judicial exception of a mental process because these limitations are merely data observations, evaluations, and/or judgements in order to analyze a dataset through statistical evaluation and mathematical calculations, which are capable of being performed mentally and/or with the aid of pen and paper. Additionally, the aforementioned limitations recite mathematical calculations, e.g. see Spec. [0045] describing the use of a mathematical calculations and optimization techniques, such as Freedman-Diaconis method, in order to analyze histogram points, identifying potential elbow or knee points, and testing bin significance. 
Similar limitations comprise the abstract ideas of Claim 11 and Claim 16.

Step 2A, prong 2 of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception(s) into a practical application of the exception.  This evaluation is performed by (a) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (b) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. 
In addition to the abstract ideas recited in claim 1, the claimed method recites additional elements including “a computer-implemented method comprising: receiving, by one or more processing devices, a dataset” and “outputting, by the one or more processing devices, an indication of the head dataset and the tail dataset” however these elements are found to be data gathering and output steps, which are recited at a high level of generality, and thus merely amount to “insignificant extra-solution” activity(ies).  See MPEP 2106.05(g) “Insignificant Extra-Solution Activity,”. Furthermore, the claim recites that the steps, e.g. “outputting”, are performed “by one or more processing devices” however this is found to be equivalent to adding the words “apply it” and mere instructions to apply a judicial exception on a general purpose computer does not integrate the abstract idea into a practical application. See MPEP 2106.05(f). 
Claim 11 recites the same additional elements as claim 1. 
Claim 16 recites the same additional elements as claim 1 and also recites “a processor set, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media”. See MPEP 2106.05(h): “For instance, a data gathering step that is limited to a particular data source (such as the Internet) or a particular type of data (such as power grid data or XML tags) could be considered to be both insignificant extra-solution activity and a field of use limitation.” System claim 16 also recites that “a processor set, one or more computer readable storage media” the processor set is merely general purpose computer hardware and/or software components used as a tool to “apply” the abstract idea in a technological environment.  
The generic data gathering, processing, and output steps, are recited at such a high level of generality (e.g. using “one or more processing devices”) that it represents no more than mere instructions to apply the judicial exceptions on a computer.  It can also be viewed as nothing more than an attempt to generally link the use of the judicial exceptions to the technological environment of a computer.  Noting MPEP 2106.04(d)(I): “It is notable that mere physicality or tangibility of an additional element or elements is not a relevant consideration in Step 2A Prong Two. As the Supreme Court explained in Alice Corp., mere physical or tangible implementation of an exception does not guarantee eligibility. Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 224, 110 USPQ2d 1976, 1983-84 (2014) ("The fact that a computer ‘necessarily exist[s] in the physical, rather than purely conceptual, realm,’ is beside the point")”. 
Thus, under Step 2A, prong 2 of the analysis, even when viewed in combination, these additional elements do not integrate the recited judicial exception into a practical application and the claim is directed to the judicial exception.  No specific practical application is associated with the claimed system.  For instance, nothing is done with the result of the statistical analysis beyond identifying a most extreme difference histogram bin, splitting the dataset into a head dataset and a tail dataset and outputting an indication of the split. 
Under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements, as described above with respect to Step 2A Prong 2, merely amount to a general purpose computer system that attempts to apply the abstract idea in a technological environment, limiting the abstract idea to a particular field of use, and/or merely performs insignificant extra-solution activit(ies) (claims 1, 11 and 16). Such insignificant extra-solution activity, e.g. data gathering and output, when re-evaluated under Step 2B is further found to be well-understood, routine, and conventional as evidenced by MPEP 2106.05(d)(II) (describing conventional activities that include transmitting and receiving data over a network, electronic recordkeeping, storing and retrieving information from memory, and electronically scanning or extracting data from a physical document). 
Therefore, similarly the combination and arrangement of the above identified additional elements when analyzed under Step 2B also fails to necessitate a conclusion that claim 1, as well as claim 11 and claim 16, amount to significantly more than the abstract idea.
With regards to the dependent claims, claims 2-10, 12-15, and 17-20, merely further expand upon the algorithm/abstract idea and do not set forth further additional elements that integrate the recited abstract idea into a practical application or amount to significantly more. Therefore, these claims are found ineligible for the reasons described for claims 1, 11 and 16. Specifically:
	With respect to dependent claims 2, 8, 12, and 17 specifically, the claims further recite determining fit parameters for the head dataset and the tail dataset, storing information from the dataset splits and data transformations, receiving a new dataset, and applying previously determined splits and transformation. These limitations merely perform additional mathematical calculations and data evaluation on already received data and involve generic data storage, retrieval, and output. Such steps do not improve the functioning of a computer or provide a technological improvement. Instead, they amount to insignificant extra solution activity and post processing of data using a general purpose computer. See MPEP 2106.05(f)(g). 
	With respect to dependent claims 4, 9, 10, 14, and 19 specifically, the claims further recite performing data transformations, calibrating across different distribution types, and iteratively splitting the dataset at successive levels. These limitations merely refine or repeat the abstract data analysis and specify condition used during the calculations. Such limitations represent refinement of the abstract idea itself or field of use limitations and do not integrate the judicial exception into a practical application. No improvement to computer functionality is recited. See MPEP 2106.05(g). 
	With respect to dependent claims 5, 6, 7, 15, and 20 specifically the claims further recite applying multiple data transformations, performing multiple dataset splits, evaluating goodness of fit scores, and applying a training model to identify splits and transformations. These limitations merely add additional mathematically analysis, optimization, and evaluation to the abstract idea. The recited training model and goodness of fit metric constitute abstract data processing and do not improve a technological process. Accordingly, these limitations amount to insignificant extra solution activity. See MPEP 2106.05(g). 
	With respect to dependent claims 3, 14, and 18 specifically, the claims further recite analyzing dependent and independent variable pairs and performing analysis on multivariate datasets. These limitations merely expand the abstract idea to additional data sets and dimensions and recite mathematical relationship and statistical analysis. Such limitations do not impose meaningful limits on the judicial exception and do not integrate the abstract idea into a practical application. See MPEP 2106.05(g)(h). 
	Accordingly, for the reasons above and those discussed in relation to independent claim 1, 11, and 16, the dependent claims are insufficient to integrate the claimed abstract ideas into a practical application or significant more. 



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20200311559 A1, Chattopadhyay et al. (hereinafter Chattopadhyay) in view of US 20080167889 A1, Kagarlis et al. (hereinafter Kagarlis). 

	Regarding Claim 1, 11 and 16, Chattopadhyay discloses a computer-implemented method (Chattopadhyay, [0223]  the circuitry of processor circuitry 1902 may comprise logic blocks or logic fabric including and other interconnected resources that may be programmed to perform various functions, such as the procedures, methods, functions, etc. of the various embodiments discussed herein) comprising:
receiving, by one or more processing devices (Chattopadhyay, [0047]  functionality of data discretizer 300 may be implemented using any type or combination of hardware and/or software logic, such as a processor (e.g., a microprocessor), application specific integrated circuit (ASIC), field programmable gate array (FPGA), or another type of integrated circuit or computing device or data processing device, and/or any associated software logic, instructions, or code), a dataset (Chattopadhyay, [0026] Data analytics has a wide range of applications in computing systems, from data mining to machine learning and artificial intelligence, and has become an increasingly important aspect of large-scale computing applications. Data preprocessing, an important initial step in data analytics, involves transforming raw data into a suitable format for further processing and analysis. For example, real-world or raw data is often incomplete, inconsistent, and/or error prone. Accordingly, raw data may go through a series of preprocessing steps, such as data cleaning, integration, transformation, reduction, and/or discretization or quantization. Data discretization, for example, may involve converting or partitioning a range of continuous raw data into a smaller number of intervals or values, [0276] receive training data corresponding to a plurality of labeled instances of a feature set);
generating, by the one or more processing devices, a histogram distribution of the dataset (Chattopadhyay, [0045] FIG. 2 illustrates an example 200 of data discretization. In the illustrated example, a histogram 204 is created for a dataset 202 by performing data discretization using an arbitrary bin size of 4, [0061]  A histogram can then be created for the particular bin size, for example, by counting the number of data elements of dataset that fall into each bin. The histogram can then be used to compute the differences in bin count for adjacent bins);
determining, by the one or more processing devices, histogram bin significance of the histogram distribution using a central tendency value (Chattopadhyay, [0051] Bin optimizer 310 first identifies a dense range 311 of the dataset 302. In some embodiments, for example, the mean and standard deviation of the dataset 302 may be computed, and then dense range 311 may be identified as a range that is within a particular number of standard deviations from the mean);
determining, by the one or more processing devices, a most extreme difference histogram bin of the histogram distribution based on the histogram distribution (Chattopadhyay, [0027] the bin size should not be so small that the histogram loses its purpose, but should not be so large that the histogram significantly deviates from the original data distribution. Accordingly, determining the optimal bin size or bin width for performing data discretization and binning may be challenging), and histogram bin significance (Chattopadhyay, [0051] Bin optimizer 310 first identifies a dense range 311 of the dataset 302. In some embodiments, for example, the mean and standard deviation of the dataset 302 may be computed, and then dense range 311 may be identified as a range that is within a particular number of standard deviations from the mean);
mapping, by the one or more processing devices, the most extreme difference histogram bin to the dataset (Chattopadhyay, [0051]  in some embodiments (e.g., for datasets with Gaussian distributions), the dense range 311 may be +−2 standard deviations from the mean. Accordingly, identifying the dense data range in this manner ensures that outliers or data with long tails do not impact the optimal bin size);
splitting, by the one or more processing devices, the dataset into a head dataset and a tail dataset at the most extreme difference histogram bin (Chattopadhyay, [0051] identifying the dense data range in this manner ensures that outliers or data with long tails do not impact the optimal bin size, [0064] the optimal bin size may then be used to identify a binned dataset or histogram, for example, by partitioning or binning the original dataset based on the optimal bin size); and
outputting, by the one or more processing devices, an indication of the head dataset and the tail dataset (Chattopadhyay, [0087]  the decision tree ML engine can detect or predict failures 709 associated with the edge device 700 in real time based on its current health as determined based on the decision tree ML model, [0256] Output device circuitry 1984 may include any number and/or combinations of audio or visual display, including, inter alia, one or more simple visual outputs/indicators, [0286]  the decision tree model is to be trained to predict failures associated with an edge computing device; and the target variable is to indicate whether a failure is predicted for the edge computing device).
	Chattopadhyay does not disclose identifying, by the one or more processing devices, an elbow/knee point by iteratively analyzing the histogram distribution based on y-axis values;
determining, by the one or more processing devices, a most extreme difference histogram bin of the histogram distribution based on the histogram distribution, and the elbow/knee point. 
	However, Kagarlis teaches identifying, by the one or more processing devices, an elbow/knee point by iteratively analyzing the histogram distribution based on y-axis values (Kagarlis, [0029] maintain reliability in the presence of error, manipulation and statistical outliers, [0064] There are various ways to deal with outliers. They can be omitted from the dataset, a practice we do not favor, or analyzed to have their origin understood. Some implementations will carefully preserve outliers for the useful information that they contain. They may be cross checked against other sources, and, to the extent they are due to human error, have their bad fields recovered from those complementary sources (e.g. false low price or large area inducing improbably low ppsf));
determining, by the one or more processing devices, a most extreme difference histogram bin of the histogram distribution based on the histogram distribution (Kagarlis, [0105] Whether one weights the data in a histogram or not, as a practical matter one has to decide what bin size 68 to use. In the extreme of infinitesimally narrow bins (high resolution) one recovers the unbinned spectrum comprising all the individual data points. In the opposite low-resolution extreme, one can bunch all the ppsf values in a single bin and suppress all the features of the distribution), and the elbow/knee point (Kagarlis, [0010] Implementations may include one or more of the following features. The data points that are excluded are associated with values that are outside defined cutoffs. The defined cutoffs include a lower cutoff that is a function of a minimum value of any of the data points. The defined cutoffs include an upper cutoff that corresponds to a maximum value of any of the data points). 
	Before the effective filing date of the claimed invention, It would have been obvious to one of ordinary skill in the art to combine the teaching of  Chattopadhyay and Kagarlis because both references relate to histogram based data discretization and threshold determination. Chattopadhyay teaches generating histograms and splitting datasets based on statistically significant bins, while Kagarlis teaches identifying an elbow or knee point in a histogram to determine meaningful cutoff values. A person of ordinary skill in the art would have been motivated to integrate the elbow/knee point identification of Kagarlis into the histogram-based splitting approach of Chattopadhyay to improve identification of optimal split points and robustness of data preprocessing. 

	Regarding Claim 2, 12 and 17, Chattopadhyay in view of Kagarlis teaches the computer-implemented method of claim 1, further comprising:
determining fit parameters for the head dataset and the tail dataset (Chattopadhyay, [0052] the range of bin resolutions 312 may be identified based on configurable parameters, such as a start resolution, stop resolution, and step, [0134] The configuration and status registers 1104 may further be used to store any other configurable parameters or status information associated with training the decision tree ML model 1114 and/or performing inference/classification using the model 1114, [0279] compute a plurality of performance costs for the plurality of possible bin sizes; and select the optimal bin size from the plurality of possible bin sizes, [0281] compute a plurality of impurity values for the subset of feature value checkpoints; select, from the subset of feature value checkpoints, a corresponding feature value for splitting the root node, wherein the corresponding feature value is selected based on the plurality of impurity values; and split the root node into a set of child nodes based on the corresponding feature value); and
outputting the fit parameters for the head dataset and the tail dataset (Chattopadhyay, [0097] At block 816, the random forest outputs a final prediction based the underlying predictions from the respective trees. In the illustrated example, the random forest outputs the prediction generated by trees 1 and 3since it received the most votes, [0136] the ensemble output module 1116 may use the trained decision tree model 1114 to classify or label the new data. For example, with respect to a random forest model 1114, the ensemble output module 1116 obtains a prediction regarding the class or label of the new data from each decision tree in the random forest, and the ensemble output module 1116 then determines a final prediction based on the collective predictions from the various trees. The final prediction from the ensemble output module 1116 is then stored in the predicted output buffer 1110 for subsequent retrieval by the host processor 1120 via the host interface 1102). 
	
	Regarding Claim 3, 13 and 18, Chattopadhyay in view of Kagarlis teaches the computer-implemented method of claim 1, further comprising analyzing sets of dependent/independent variable pairs of a dataset (Chattopadhyay, [0097] the input data may include newly captured, previously unseen, and/or unlabeled sensor data that needs to be classified or labeled, [0276] receive training data corresponding to a plurality of labeled instances of a feature set, wherein the training data is captured at least partially by one or more sensors; and receive inference data corresponding to an unlabeled instance of the feature set, [0112] by using the binning algorithm, the number of Gini computations is very minimal and can be further sped up by parallelism, as each computation of Gini is independent)., and the dataset comprising a multi-variate dataset (Chattopadhyay, [0092] The decision tree ML engine 710 may be implemented using any type or combination of decision tree machine learning algorithms, including random forests, centered forests, uniform forests, rotation forests, ensemble decision trees, boosting trees, bagging trees, classification and regression trees (CART), conditional inference trees, fuzzy decision trees (FDT), decision lists, iterative dichotomiser 3 (ID3), C4.5, chi-square automatic interaction detection (CHAID), and multivariate adaptive regression splines (MARS)).

	Regarding Claim 4, 14 and 19, Chattopadhyay in view of Kagarlis teaches the computer-implemented method of claim 1, further comprising performing a data transformation of the dataset (Chattopadhyay, [0026]  raw data may go through a series of preprocessing steps, such as data cleaning, integration, transformation, reduction, and/or discretization or quantization).

	Regarding Claim 5, 15 and 20, Chattopadhyay in view of Kagarlis teaches the computer-implemented method of claim 1, further comprising applying multiple data transformations (Chattopadhyay, [0111] this disclosure presents an efficient training algorithm for a decision tree (e.g., a random forest) that optimizes the best cutoff point selection process. In particular, only a few key predetermined points are examined for the cutoff point selection process, which are determined using the automated data discretization and binning algorithm described above in connection with FIGS. 1-6) and multiple splits of the dataset to determine best combined splits with best combined data transformations (Chattopadhyay, [0110] The traditional random forest method described above sorts each attribute and computes Gini indexes (children Gini and Total Gini) for each value of the attribute to identify the best cutoff point. For example, if there are 12,000 samples in a training set, the traditional method computes Gini for each of the 12,000 values, irrespective of its value, to identify the best cutoff value which divides the data best into two different classes or labels).

	Regarding Claim 6, Chattopadhyay discloses the computer-implemented method of claim 5, further comprising analyzing the dataset using goodness of fit scores as a reference point (Chattopadhyay, [0092] Moreover, the decision tree ML engine 710 can be used to implement any machine learning application or use case that relies on decision tree machine learning (e.g., fault detection, medical diagnostics, etc.). The decision tree ML engine 710 may be implemented using any type or combination of decision tree machine learning algorithms, including random forests, centered forests, uniform forests, rotation forests, ensemble decision trees, boosting trees, bagging trees, classification and regression trees (CART), conditional inference trees, fuzzy decision trees (FDT), decision lists, iterative dichotomiser 3 (ID3), C4.5, chi-square automatic interaction detection (CHAID)).
	However, Kagarlis teaches further comprising analyzing the dataset using goodness of fit scores as a reference point (Kagarlis, [0005] Power laws have been widely observed in nature, and particularly in such phenomena as financial market movements and income distribution. Pareto's Law in particular was proposed as an empirical description of an apparent "80/20" distribution of wealth).
	Before the effective filing date of the claimed invention, It would have been obvious to one of ordinary skill in the art to combine the teaching of  Chattopadhyay and Kagarlis because both references analyze dataset distribution using statistical techniques to evaluate data behavior. Chattopadhyay teaches analyzing datasets within a machine learning framework, including evaluating binning configurations and dataset splits using statistical metrics. Kagarlis further teaches analyzing datasets using goodness of fit concepts to evaluate how well observed data conforms to expected statistical distributions in order to identify important characteristics within the data. A person of ordinary skill in the art would have been motivated to incorporate the goodness of fit based analysis of Kagarlis into the dataset analysis of Chattopadhyay to provide additional evaluation of the data and improve reliability of data preprocessing and training. 

	Regarding Claim 7, Chattopadhyay discloses the computer-implemented method of claim 6, further comprising applying a training model that gathers descriptive statistics of the data (Chattopadhyay, [0026] the raw data values are aggregated and the size of the dataset is reduced, and the resulting binned dataset may then be used for further analysis and processing, such as for data mining or machine learning and artificial intelligence (e.g., computer vision, autonomous navigation, computer or processor optimizations, speech and audio recognition, natural language processing)) and identifies the splits (Chattopadhyay, [0026] Data discretization, for example, may involve converting or partitioning a range of continuous raw data into a smaller number of intervals or values. For example, data binning is a form of data discretization that involves grouping a collection of continuous values into a smaller number of “bins” that each represent a particular interval or range. The original data values may each be grouped into a defined interval or bin, and thus may be replaced by a value representative of that interval or bin, such as a center or boundary value of the interval), the data transformations (Chattopadhyay, [0026] Data preprocessing, an important initial step in data analytics, involves transforming raw data into a suitable format for further processing and analysis), and the goodness of fit scores (Chattopadhyay, [0092] Moreover, the decision tree ML engine 710 can be used to implement any machine learning application or use case that relies on decision tree machine learning (e.g., fault detection, medical diagnostics, etc.). The decision tree ML engine 710 may be implemented using any type or combination of decision tree machine learning algorithms, including random forests, centered forests, uniform forests, rotation forests, ensemble decision trees, boosting trees, bagging trees, classification and regression trees (CART), conditional inference trees, fuzzy decision trees (FDT), decision lists, iterative dichotomiser 3 (ID3), C4.5, chi-square automatic interaction detection (CHAID)).
	However, Kagarlis teaches the goodness of fit scores (Kagarlis, [0005] Power laws have been widely observed in nature, and particularly in such phenomena as financial market movements and income distribution. Pareto's Law in particular was proposed as an empirical description of an apparent "80/20" distribution of wealth).
	Before the effective filing date of the claimed invention, It would have been obvious to one of ordinary skill in the art to combine the teaching of  Chattopadhyay and Kagarlis because both references analyze datasets using statistical measures in the context of machine learning to characterize data behavior. Chattopadhyay teaches applying a training model, such as a decision tree machine learning model, that gathers statistic of the data and identifies dataset splits and data transformations based on statical criteria. Kagarlis further teaches evaluating data distributions using goodness of fit concepts to assess how well observed data conforms to expected statistical distributions. A person of ordinary skill in the art would have been motivated to integrate the goodness of fit analysis of Kagarlis into the training model of Chattopadhyay to provide additional details for identifying splits and transformations during training. 

	Regarding Claim 8, Chattopadhyay in view of Kagarlis teaches the computer-implemented method of claim 5, further comprising:
storing information from the splits and the data transformations in a training model (Chattopadhyay, [0291] one non-transitory machine-readable storage medium having instructions stored thereon, wherein the instructions, when executed on processing circuitry, cause the processing circuitry to: receive, via interface circuitry, training data corresponding to a plurality of labeled instances of a feature set, wherein the training data is captured at least partially by one or more sensors; compute, based on the training data);
receiving a new dataset (Chattopadhyay, [0136] The trained decision tree model 1114 can then be used to perform inference and/or generate predictions for new data supplied by the host processor 1120 via the inference input buffers 1108. In some embodiments, for example, the new data may include newly captured and/or previously unseen data that has not yet been classified or labeled);
determining if the new dataset has similarity to a first dataset (Chattopadhyay, [0076] data discretizer 640 performs data binning to reduce the size and/or compress the first dataset 602 into a second “binned” dataset 604); and
in response to determining that the new dataset has similarity to the first dataset (Chattopadhyay, [0112] the binning algorithm just needs a comparator to generate a histogram), applying the splits (Chattopadhyay, [0287] set of feature value checkpoints is computed by the host processor based on binning a set of feature values for each feature of the feature set using an optimal bin size) and the data transformations to the new dataset (Chattopadhyay, [0283] the set of feature value checkpoints for training the decision tree model; and an artificial intelligence accelerator to: train the decision tree model based on the training data and the set of feature value checkpoints; and perform inference using the decision tree model to predict the target variable for the unlabeled instance of the feature set),
wherein the dataset is the first dataset (Chattopadhyay, [0074]  a first dataset 602 is obtained initially).
	
	Regarding Claim 9, Chattopadhyay in view of Kagarlis teaches the computer-implemented method of claim 1, further comprising calibrating across a
broad range of distribution types (Kagarlis, [0005] Power laws have been widely observed in nature, and particularly in such phenomena as financial market movements and income distribution. Pareto's Law in particular was proposed as an empirical description of an apparent "80/20" distribution of wealth).
	Before the effective filing date of the claimed invention, It would have been obvious to one of ordinary skill in the art to combine the teaching of  Chattopadhyay and Kagarlis because both references analyze datasets having statistical distribution characteristics. Chattopadhyay teaches histogram data discretization, bin optimization, and analysis techniques. While, Kagarlis further teaches that real world datasets often follow different distribution models such as Pareto power law distribution and discloses analyzing such data distribution to characterize a given dataset. A person of ordinary skill in the art would have been motivated to integrate the distribution type analysis of Kagarlis into the data processing framework of Chattopadhyay to calibrate the analysis using one of the broad range of distribution types to improve robustness of data processing. 

	Regarding Claim 10, Chattopadhyay in view of Kagarlis teaches the computer-implemented method of claim 1, further comprising iteratively splitting the dataset at successive levels of the head dataset and the tail dataset (Chattopadhyay, [0278] wherein the set of feature value checkpoints is computed by the host processor based on binning a set of feature values for each feature of the feature set using an optimal bin size, [0152] The flowchart repeats blocks 1308-1312 in this manner to continue performing inference using the trained decision tree model as new inference data becomes available, [0279]  identify a plurality of possible bin sizes for binning the set of feature values; compute a plurality of performance costs for the plurality of possible bin sizes; and select the optimal bin size from the plurality of possible bin sizes, wherein the optimal bin size corresponds to a lowest performance cost of the plurality of performance costs, [0148] splitting the root node into a set of child nodes based on the corresponding feature value; and repeating the process for the child nodes in a recursive manner until each remaining child node is a leaf node (e.g., a node with either a single data point or multiple data points that all share the same label)).
	

Pertinent Prior Art
The prior art made of record and not relied upon is considered pertinent to
applicant’s disclose:
-US 20160110497 A1, which describes methods, processes, apparatuses, and machines for non-invasive assessment of genetic variations. The reference broadly involves data discretization, identification of elbow or knee points in histogram distributions, determination of histogram bin significance, or splitting datasets into head and tail datasets as claimed. 
-US 20240077529 A1, which describes systems and methods for calculating and using noise distributions in an electric power system. The reference broadly involves generating and analyzing distributions of measured data, it does not specifically disclose histogram based data discretization techniques involving identification of elbow or knee points, determination of histogram bin significance using central tendency values, or splitting datasets into head and tail datasets as claimed. 
-US 20220012525 A1, which describes methods and systems for histogram generation, including acquiring histogram generation, including acquiring histogram bins from data points and merging data points into bins based on distance thresholds and bin widths. While the reference relates to histogram construction and data distributions. 





Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IBRAHIM NAGI SHOHATEE whose telephone number is (571)272-6612. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Shelby Turner can be reached at (571) 272-6334. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/IBRAHIM NAGI SHOHATEE/ Examiner, Art Unit 2857                                                                                                                                                                                             



/SHELBY A TURNER/Supervisory Patent Examiner, Art Unit 2857
Read full office action
OPTIMALLY DIVIDING DATASET DISTRIBUTIONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

OPTIMALLY DIVIDING DATASET DISTRIBUTIONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email