Why Does Unsupervised Pre-training Help Deep Learning?

Abstract

Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pre-training phase. The main question investigated here is the following: why does unsupervised pre-training work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization (Bengio et al., 2007). Our results build on the work of Erhan et al. (2009b), showing that unsupervised pre-training appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pre-training effect.

Keywords

Artificial intelligenceUnsupervised learningComputer scienceMachine learningDeep learningRegularization (linguistics)GeneralizationAutoencoderDeep belief networkSemi-supervised learningCompetitive learningTraining (meteorology)Mathematics

Affiliated Institutions

Related Publications

Momentum Contrast for Unsupervised Visual Representation Learning

Kaiming He , Haoqi Fan , Yuxin Wu +2 more

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic diction...

2020 2020 IEEE/CVF Conference on Computer ... 11112 citations

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang , Phillip Isola , Alexei A. Efros +2 more

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, ...

2018 2018 IEEE/CVF Conference on Computer ... 10763 citations

Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs

Shuangfei Zhai , Zhongfei Zhang

Matrix factorization (MF) and Autoencoder (AE) are among the most successful approaches of unsupervised learning. While MF based models have been extensively exploited in the gr...

2015 33 citations

Unsupervised Feature Learning via Non-parametric Instance Discrimination

Zhirong Wu , Yuanjun Xiong , Stella X. Yu +1 more

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether...

2018 3435 citations

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe , Christian Szegedy

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. T...

2024 arXiv (Cornell University) 15635 citations

Publication Info

Year: 2010
Type: article
Volume: 11
Issue: 19
Pages: 625-660
Citations: 2107
Access: Closed

External Links

Citation Metrics

2107

OpenAlex

Cite This

APA Style

                            
                                    Dumitru Erhan, 
                                
                                    Yoshua Bengio, 
                                
                                    Aaron Courville
                                
                                et al.
                            
                            (2010). 
                            Why Does Unsupervised Pre-training Help Deep Learning?. 
                            
                            , 11
                            (19)
                            , 625-660.