Abstract

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

Keywords

OverfittingFeature (linguistics)Dropout (neural networks)Computer scienceBenchmark (surveying)Adaptation (eye)Context (archaeology)DetectorSet (abstract data type)Artificial intelligenceArtificial neural networkPattern recognition (psychology)Variety (cybernetics)Feed forwardFeedforward neural networkMachine learningEngineeringPsychology

Affiliated Institutions

Related Publications

Fractional Max-Pooling

Convolutional networks almost always incorporate some form of spatial pooling, and very often it is alpha times alpha max-pooling with alpha=2. Max-pooling act on the hidden lay...

2014 arXiv (Cornell University) 335 citations

Publication Info

Year
2012
Type
preprint
Citations
6630
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

6630
OpenAlex

Cite This

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky et al. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.1207.0580

Identifiers

DOI
10.48550/arxiv.1207.0580