Abstract

We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifiers construct their decision surface from strongly overlapping small (k: 4) subsets of the data base. This finding opens up the possibiiity of compressing data bases significantly by disposing of the data which is not important for the solution of a given task. In addition, we show that the theory allows us to predict the classifier that will have the best generalization ability, based solely on performance on the training set and characteristics of the learning machines. This finding is important for cases where the amount of available data is limited.

Keywords

Computer scienceArtificial intelligenceTask (project management)Support vector machineClassifier (UML)Construct (python library)Machine learningBase (topology)Training setGeneralizationData miningDigit recognitionPattern recognition (psychology)Data setArtificial neural networkMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
1995
Type
article
Pages
252-257
Citations
542
Access
Closed

External Links

Citation Metrics

542
OpenAlex

Cite This

Bernhard Schölkopf, Chris Burges, Vladimir Vapnik (1995). Extracting support data for a given task. , 252-257.