Abstract

Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets covering a wide range of scenes and object categories with pixel-wise annotations for scene understanding. In this work, we present a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. Totally there are 25k images of the complex everyday scenes containing a variety of objects in their natural spatial context. On average there are 19.5 instances and 10.5 object classes per image. Based on ADE20K, we construct benchmarks for scene parsing and instance segmentation. We provide baseline performances on both of the benchmarks and re-implement state-of-the-art models for open source. We further evaluate the effect of synchronized batch normalization and find that a reasonably large batch size is crucial for the semantic segmentation performance. We show that the networks trained on ADE20K are able to segment a wide variety of scenes and objects.

Keywords

Computer scienceNormalization (sociology)ParsingSegmentationArtificial intelligenceVariety (cybernetics)Object (grammar)PixelContext (archaeology)Pattern recognition (psychology)Computer visionGeography

Affiliated Institutions

Related Publications

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification...

2022 2022 IEEE/CVF Conference on Computer ... 5683 citations

Publication Info

Year
2018
Type
article
Volume
127
Issue
3
Pages
302-321
Citations
1504
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1504
OpenAlex
246
Influential

Cite This

Bolei Zhou, Hang Zhao, Xavier Puig et al. (2018). Semantic Understanding of Scenes Through the ADE20K Dataset. International Journal of Computer Vision , 127 (3) , 302-321. https://doi.org/10.1007/s11263-018-1140-0

Identifiers

DOI
10.1007/s11263-018-1140-0
arXiv
1608.05442

Data Quality

Data completeness: 88%