Best-first Decision Tree Learning

2007 Research Commons (University of Waikato) 229 citations

Abstract

Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depth-first order (Quinlan, 1993), while in best-first decision tree learners the ”best ” node is expanded first. The ”best ” node is the node whose split leads to maximum reduction of impurity (e.g. Gini index or information in this thesis) among all nodes available for splitting. The resulting tree will be the same when fully grown, just the order in which it is built is different. In practice, some branches of a fully-expanded tree do not truly reflect the underlying information in the domain. This problem is known as overfitting and is mainly caused by noisy data. Pruning is necessary to avoid overfitting the training data, and discards those parts that are not predictive of future data. Best-first node expansion enables us to investigate new pruning techniques by determining the number of expansions performed based on cross-validation. This thesis first introduces the algorithm for building binary best-first decision trees for classification problems. Then, it investigates two new pruning methods that

Keywords

PruningOverfittingDecision treeComputer scienceNode (physics)Machine learningTree (set theory)Artificial intelligenceIncremental decision treeDecision stumpData miningTree traversalDecision tree learningMathematicsAlgorithmArtificial neural networkEngineering

Related Publications

Publication Info

Year
2007
Type
dissertation
Citations
229
Access
Closed

External Links

Citation Metrics

229
OpenAlex

Cite This

Haijian Shi (2007). Best-first Decision Tree Learning. Research Commons (University of Waikato) .