Abstract

Progress on object detection is enabled by datasets that focus the research community’s attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation. We plan to collect 2.2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images. Due to the Zipfian distribution of categories in natural images, LVIS naturally has a long tail of categories with few training samples. Given that state-of-the-art deep learning methods for object detection perform poorly in the low-sample regime, we believe that our dataset poses an important and exciting new scientific challenge. LVIS is available at http://www.lvisdataset.org.

Keywords

Computer scienceSegmentationArtificial intelligenceFocus (optics)Object (grammar)VocabularyBounding overwatchObject detectionDeep learningPattern recognition (psychology)Information retrievalMachine learning

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
article
Citations
1091
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1091
OpenAlex

Cite This

Agrim Gupta, Piotr Dollár, Ross Girshick (2019). LVIS: A Dataset for Large Vocabulary Instance Segmentation. . https://doi.org/10.1109/cvpr.2019.00550

Identifiers

DOI
10.1109/cvpr.2019.00550