Abstract
Classification is an important data mining problem. Although classification is a well-studied problem, most of the current classification algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classification algorithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model. This parallelization, also presented here, exhibits excellent scalability as well. The combination of these characteristics makes the proposed algorithm an ideal tool for data mining.
Keywords
Related Publications
AMBERCUBE MD, parallelization of Amber's molecular dynamics module for distributed‐memory hypercube computers
Abstract A fully functional parallel version of the molecular dynamics (MD) module of AMBER3a has been implemented. Procedures parallelized include the calculation of the long‐r...
Stability-Based Validation of Clustering Solutions
Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated ...
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently ...
Statistical pattern recognition: a review
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated...
Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)
Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versi...
Publication Info
- Year
- 1996
- Type
- article
- Pages
- 544-555
- Citations
- 781
- Access
- Closed