Understanding deep learning (still) requires rethinking generalization

Abstract

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models. We supplement this republication with a new section at the end summarizing recent progresses in the field since the original version of this paper.

Keywords

Regularization (linguistics)Computer scienceGeneralizationArtificial intelligenceArtificial neural networkDeep neural networksDeep learningEarly stoppingConvolutional neural networkMachine learningStochastic gradient descentAlgorithmMathematics

Affiliated Institutions

Related Publications

Exploring Strategies for Training Deep Neural Networks

Hugo Larochelle , Yoshua Bengio , Jérôme Louradour +1 more

Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently...

2009 Journal of Machine Learning Research 1114 citations

Improved Adam Optimizer for Deep Neural Networks

Zijun Zhang

Adaptive optimization algorithms, such as Adam and RMSprop, have witnessed better optimization performance than stochastic gradient descent (SGD) in some scenarios. However, rec...

2018 1244 citations

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot , Yoshua Bengio

Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with...

2010 12630 citations

A State-of-the-Art Survey on Deep Learning Theory and Architectures

Md Zahangir Alom , Tarek M. Taha , Chris Yakopcic +7 more

In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been appl...

2019 Electronics 1521 citations

DeepGCNs: Can GCNs Go As Deep As CNNs?

Guohao Li , Matthias Müller , Ali Thabet +1 more

Convolutional Neural Networks (CNNs) achieve impressive performance in a wide variety of fields. Their success benefited from a massive boost when very deep CNN models were able...

2019 1268 citations

Publication Info

Year: 2021
Type: article
Volume: 64
Issue: 3
Pages: 107-115
Citations: 2043
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

Understanding deep learning (still) requires rethinking generalization

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2043

OpenAlex

Influential

Cite This

APA Style

                            
                                
                                    Chiyuan Zhang, 
                                
                                    Samy Bengio, 
                                
                                    Moritz Hardt
                                
                                et al.
                            
                            (2021). 
                            Understanding deep learning (still) requires rethinking generalization. 
                            Communications of the ACM
                            , 64
                            (3)
                            , 107-115.
                            https://doi.org/10.1145/3446776
                        

Identifiers

DOI: 10.1145/3446776

Data Quality

Data completeness: 86%