On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition

Šarūnas Raudys; Vitalijus Pikelis

doi:10.1109/tpami.1980.4767011

Abstract

This paper compares four classification algorithms-discriminant functions when classifying individuals into two multivariate populations. The discriminant functions (DF's) compared are derived according to the Bayes rule for normal populations and differ in assumptions on the covariance matrices' structure. Analytical formulas for the expected probability of misclassification EPN are derived and show that the classification error EPN depends on the structure of a classification algorithm, asymptotic probability of misclassification P¿, and the ratio of learning sample size N to dimensionality p:N/p for all linear DF's discussed and N2/p for quadratic DF's. The tables for learning quantity H = EPN/P¿ depending on parameters P¿, N, and p for four classifilcation algorithms analyzed are presented and may be used for estimating the necessary learning sample size, detennining the optimal number of features, and choosing the type of the classification algorithm in the case of a limited learning sample size.

Keywords

Pattern recognition (psychology)Curse of dimensionalityArtificial intelligenceComputer scienceStatistical classificationSample (material)Algorithm

Related Publications

Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data

Wenhui Yang , Dao‐Qing Dai , Hong Yan

High-dimensional data and the small sample size problem occur in many modern pattern classification applications such as face recognition and gene expression data analysis. To d...

2008 IEEE Transactions on Knowledge and Da... 56 citations

The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

Peter L. Bartlett

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization p...

1998 IEEE Transactions on Information Theory 1185 citations

Estimation of Error Rates in Discriminant Analysis

Peter A. Lachenbruch , M. R. Mickey

Several methods of estimating error rates in Discriminant Analysis are evaluated by sampling methods. Multivariate normal samples are generated on a computer which have various ...

1968 Technometrics 1480 citations

Machine learning algorithm validation with a limited sample size

Andrius Vabalas , Emma Gowen , Ellen Poliakoff +1 more

Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, whic...

2019 PLoS ONE 1443 citations

Classification of cervical cell nuclei using morphological segmentation and textural feature extraction

Ross F. Walker , Paul Jackway , Brian C. Lovell +1 more

This paper presents preliminary results for the classification of Pap Smear cell nuclei, using gray level co-occurrence matrix (GLCM) textural features. We outline a method of n...

2002 37 citations

Publication Info

Year: 1980
Type: article
Volume: PAMI-2
Issue: 3
Pages: 242-252
Citations: 166
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

166

OpenAlex

Cite This

APA Style

                            
                                    Šarūnas Raudys, 
                                
                                    Vitalijus Pikelis
                                
                            (1980). 
                            On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition. 
                            IEEE Transactions on Pattern Analysis and Machine Intelligence
                            , PAMI-2
                            (3)
                            , 242-252.
                            https://doi.org/10.1109/tpami.1980.4767011

Identifiers

DOI: 10.1109/tpami.1980.4767011