Extracting support data for a given task

Abstract

We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifiers construct their decision surface from strongly overlapping small (k: 4) subsets of the data base. This finding opens up the possibiiity of compressing data bases significantly by disposing of the data which is not important for the solution of a given task. In addition, we show that the theory allows us to predict the classifier that will have the best generalization ability, based solely on performance on the training set and characteristics of the learning machines. This finding is important for cases where the amount of available data is limited.

Keywords

Computer scienceArtificial intelligenceTask (project management)Support vector machineClassifier (UML)Construct (python library)Machine learningBase (topology)Training setGeneralizationData miningDigit recognitionPattern recognition (psychology)Data setArtificial neural networkMathematics

Affiliated Institutions

Related Publications

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Hanchuan Peng , Fuhui Long , Chen Ding

Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion base...

2005 IEEE Transactions on Pattern Analysis... 10050 citations

Part-Based Statistical Models for Object Classification and Detection

Elliot Joel Bernstein , Yali Amit

We propose using simple mixture models to define a set of mid-level binary local features based on binary oriented edge input. The features capture natural local structures in t...

2005 31 citations

Fast Training of Support Vector Machines Using Sequential Minimal Optimization

John Platt

This chapter describes a new algorithm for training Support Vector Machines: Sequential Minimal Optimization, or SMO. Training a Support Vector Machine (SVM) requires the soluti...

1998 The MIT Press eBooks 5457 citations

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron , Hugo Touvron , Ishan Misra +4 more

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond t...

2021 2021 IEEE/CVF International Conferenc... 4220 citations

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

Sangdoo Yun , Dongyoon Han , Sanghyuk Chun +3 more

Regional dropout strategies have been proposed to enhance performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to atte...

2019 4293 citations

Publication Info

Year: 1995
Type: article
Pages: 252-257
Citations: 542
Access: Closed

External Links

Citation Metrics

542

OpenAlex

Cite This

APA Style

                            
                                    Bernhard Schölkopf, 
                                
                                    Chris Burges, 
                                
                                    Vladimir Vapnik
                                
                            (1995). 
                            Extracting support data for a given task. 
                            
                            , 252-257.