Abstract

Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pre-training phase. The main question investigated here is the following: why does unsupervised pre-training work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization (Bengio et al., 2007). Our results build on the work of Erhan et al. (2009b), showing that unsupervised pre-training appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pre-training effect.

Keywords

Artificial intelligenceUnsupervised learningComputer scienceMachine learningDeep learningRegularization (linguistics)GeneralizationAutoencoderDeep belief networkSemi-supervised learningCompetitive learningTraining (meteorology)Mathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Volume
11
Issue
19
Pages
625-660
Citations
2107
Access
Closed

External Links

Citation Metrics

2107
OpenAlex

Cite This

Dumitru Erhan, Yoshua Bengio, Aaron Courville et al. (2010). Why Does Unsupervised Pre-training Help Deep Learning?. , 11 (19) , 625-660.