The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Abstract

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.

Keywords

PruningInitializationLotteryComputer scienceTicketInferenceMachine learningArtificial intelligenceArtificial neural networkConvolutional neural networkMNIST databaseTest (biology)MathematicsStatisticsComputer security

Affiliated Institutions

Massachusetts Institute of Technology US

Related Publications

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot , Yoshua Bengio

Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with...

2010 12630 citations

An empirical evaluation of deep architectures on problems with many factors of variation

Hugo Larochelle , Dumitru Erhan , Aaron Courville +2 more

Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only...

2007 1021 citations

A Study of Complex Deep Learning Networks on High-Performance, Neuromorphic, and Quantum Computers

Thomas E. Potok , Catherine D. Schuman , Steven R. Young +6 more

Current deep learning approaches have been very successful using convolutional neural networks trained on large graphical-processing-unit-based computers. Three limitations of t...

2018 ACM Journal on Emerging Technologies ... 70 citations

Deep Sparse Rectifier Neural Networks

Xavier Glorot , Antoine Bordes , Yoshua Bengio

While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows...

2012 5408 citations

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe , Christian Szegedy

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. T...

2024 arXiv (Cornell University) 15635 citations

Publication Info

Year: 2018
Type: preprint
Citations: 1317
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1317

OpenAlex

Cite This

APA Style

                            
                                    Jonathan Frankle, 
                                
                                    Michael Carbin
                                
                            (2018). 
                            The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.1803.03635

Identifiers

DOI: 10.48550/arxiv.1803.03635