Baby talk: Understanding and generating simple image descriptions

Girish Kulkarni; Visruth Premraj; Sagnik Dhar; Siming Li; Yejin Choi; Alexander C. Berg; Tamara L. Berg

doi:10.1109/cvpr.2011.5995466

Abstract

We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision. The system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work.

Keywords

Computer scienceExploitParsingNatural languageSimple (philosophy)Artificial intelligenceImage (mathematics)Natural language processingFactor (programming language)Human–computer interactionInformation retrievalProgramming languageComputer security

Affiliated Institutions

Stony Brook University US

Related Publications

Parsing Natural Scenes and Natural Language with Recursive Neural Networks

Richard Socher , Cliff Chiung-Yu Lin , Christopher D. Manning +1 more

Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure help...

2011 1202 citations

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Tao Shen , Tianyi Zhou , Guodong Long +3 more

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms ...

2018 Proceedings of the AAAI Conference on... 729 citations

Transition network grammars for natural language analysis

William A. Woods

The use of augmented transition network grammars for the analysis of natural language sentences is described. Structure-building actions associated with the arcs of the grammar ...

1970 Communications of the ACM 1326 citations

Unified Perceptual Parsing for Scene Understanding

Tete Xiao , Yingcheng Liu , Bolei Zhou +2 more

Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the object...

2018 Lecture notes in computer science 1776 citations

Natural Language Processing with Python

Steven Bird , Ewan Klein , Edward Loper

This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filt...

2009 CERN Document Server (European Organi... 3449 citations

Publication Info

Year: 2011
Type: article
Pages: 1601-1608
Citations: 529
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Baby talk: Understanding and generating simple image descriptions

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

529

OpenAlex

Cite This

APA Style

                            
                                    Girish Kulkarni, 
                                
                                    Visruth Premraj, 
                                
                                    Sagnik Dhar
                                
                                et al.
                            
                            (2011). 
                            Baby talk: Understanding and generating simple image descriptions. 
                            
                            , 1601-1608.
                            https://doi.org/10.1109/cvpr.2011.5995466

Identifiers

DOI: 10.1109/cvpr.2011.5995466