Abstract

We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision. The system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work.

Keywords

Computer scienceExploitParsingNatural languageSimple (philosophy)Artificial intelligenceImage (mathematics)Natural language processingFactor (programming language)Human–computer interactionInformation retrievalProgramming languageComputer security

Affiliated Institutions

Related Publications

Publication Info

Year
2011
Type
article
Pages
1601-1608
Citations
529
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

529
OpenAlex

Cite This

Girish Kulkarni, Visruth Premraj, Sagnik Dhar et al. (2011). Baby talk: Understanding and generating simple image descriptions. , 1601-1608. https://doi.org/10.1109/cvpr.2011.5995466

Identifiers

DOI
10.1109/cvpr.2011.5995466