Abstract

We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles. We achieve this by extracting and filtering image caption annotations from billions of webpages. We also present quantitative evaluations of a number of image captioning models and show that a model architecture based on Inception-ResNetv2 (Szegedy et al., 2016) for image-feature extraction and Transformer (Vaswani et al., 2017) for sequence modeling achieves the best performance when trained on the Conceptual Captions dataset.

Keywords

Closed captioningComputer scienceImage (mathematics)Artificial intelligenceFeature extractionFeature (linguistics)Information retrievalPattern recognition (psychology)Natural language processing

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Citations
1687
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1687
OpenAlex
420
Influential
1137
CrossRef

Cite This

Piyush Sharma, Nan Ding, Sebastian Goodman et al. (2018). Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . https://doi.org/10.18653/v1/p18-1238

Identifiers

DOI
10.18653/v1/p18-1238

Data Quality

Data completeness: 81%