Abstract

In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimise the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS-Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on the IJB-A and IJB-B face recognition benchmarks, exceeding the previous state-of-the-art by a large margin. The dataset and models are publicly available.

Keywords

Computer scienceConvolutional neural networkArtificial intelligenceFacial recognition systemFace (sociological concept)Pattern recognition (psychology)Identity (music)Margin (machine learning)Feature extractionMachine learning

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Pages
67-74
Citations
2751
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2751
OpenAlex
506
Influential
2015
CrossRef

Cite This

Qiong Cao, Li Shen, Weidi Xie et al. (2018). VGGFace2: A Dataset for Recognising Faces across Pose and Age. 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) , 67-74. https://doi.org/10.1109/fg.2018.00020

Identifiers

DOI
10.1109/fg.2018.00020
arXiv
1710.08092

Data Quality

Data completeness: 88%