Abstract

Classification is an important data mining problem. Although classification is a well-studied problem, most of the current classification algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classification algorithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model. This parallelization, also presented here, exhibits excellent scalability as well. The combination of these characteristics makes the proposed algorithm an ideal tool for data mining.

Keywords

Computer scienceScalabilityDecision treeData miningDecision tree learningStatistical classificationClassifier (UML)Machine learningArtificial intelligenceDatabase

Related Publications

Publication Info

Year
1996
Type
article
Pages
544-555
Citations
781
Access
Closed

External Links

Citation Metrics

781
OpenAlex

Cite This

John Shafer, Rakesh Agrawal, Manish Mehta (1996). SPRINT: A Scalable Parallel Classifier for Data Mining. , 544-555.