Do Better ImageNet Models Transfer Better?

Simon Kornblith; Jonathon Shlens; Quoc V. Le

doi:10.1109/cvpr.2019.00277

Abstract

Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance of 16 classification networks on 12 image classification datasets. We find that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy (r = 0.99 and 0.96, respectively). In the former setting, we find that this relationship is very sensitive to the way in which networks are trained on ImageNet; many common forms of regularization slightly improve ImageNet accuracy but yield features that are much worse for transfer learning. Additionally, we find that, on two small fine-grained image classification datasets, pretraining on ImageNet provides minimal benefits, indicating the learned features from ImageNet do not transfer well to fine-grained tasks. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested.

Keywords

Computer scienceTransfer of learningArtificial intelligenceMachine learningRegularization (linguistics)Feature (linguistics)Contextual image classificationPattern recognition (psychology)Image (mathematics)

Affiliated Institutions

Google (United States) US

Related Publications

Rethinking ImageNet Pre-Training

Kaiming He , Ross Girshick , Piotr Dollár

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no wors...

2019 979 citations

Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler , Rob Fergus

Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why the...

2013 arXiv (Cornell University) 447 citations

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Kaiming He , Xiangyu Zhang , Shaoqing Ren +1 more

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is "artificial" and may reduce the recognition accuracy f...

2015 IEEE Transactions on Pattern Analysis... 10916 citations

CNN Features Off-the-Shelf: An Astounding Baseline for Recognition

Ali Sharif Razavian , Hossein Azizpour , Josephine Sullivan +1 more

Recent results indicate that the generic descriptors ex-tracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is i...

2014 4279 citations

EfficientNetV2: Smaller Models and Faster Training

Mingxing Tan , Quoc V. Le

This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop thi...

2021 arXiv (Cornell University) 1103 citations

Publication Info

Year: 2019
Type: article
Pages: 2656-2666
Citations: 1176
Access: Closed

External Links