Improving neural networks by preventing co-adaptation of feature detectors

Abstract

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

Keywords

OverfittingFeature (linguistics)Dropout (neural networks)Computer scienceBenchmark (surveying)Adaptation (eye)Context (archaeology)DetectorSet (abstract data type)Artificial intelligenceArtificial neural networkPattern recognition (psychology)Variety (cybernetics)Feed forwardFeedforward neural networkMachine learningEngineeringPsychology

Affiliated Institutions

University of Toronto CA

Related Publications

Cascade R-CNN: Delving Into High Quality Object Detection

Zhaowei Cai , Nuno Vasconcelos

In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usua...

2018 2018 IEEE/CVF Conference on Computer ... 6294 citations

Focal Loss for Dense Object Detection

Tsung-Yi Lin , Priya Goyal , Ross Girshick +2 more

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations...

2018 IEEE Transactions on Pattern Analysis... 9004 citations

Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs

Shuangfei Zhai , Zhongfei Zhang

Matrix factorization (MF) and Autoencoder (AE) are among the most successful approaches of unsupervised learning. While MF based models have been extensively exploited in the gr...

2015 33 citations

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe , Christian Szegedy

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. T...

2024 arXiv (Cornell University) 15635 citations

Fractional Max-Pooling

Benjamin Graham

Convolutional networks almost always incorporate some form of spatial pooling, and very often it is alpha times alpha max-pooling with alpha=2. Max-pooling act on the hidden lay...

2014 arXiv (Cornell University) 335 citations

Publication Info

Year: 2012
Type: preprint
Citations: 6630
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Improving neural networks by preventing co-adaptation of feature detectors

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

6630

OpenAlex

Cite This

APA Style

                            
                                    Geoffrey E. Hinton, 
                                
                                    Nitish Srivastava, 
                                
                                    Alex Krizhevsky
                                
                                et al.
                            
                            (2012). 
                            Improving neural networks by preventing co-adaptation of feature detectors. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.1207.0580

Identifiers

DOI: 10.48550/arxiv.1207.0580