Abstract

Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes (Models are available at https://github.com/CSAILVision/unifiedparsing).

Keywords

Computer scienceParsingTask (project management)Artificial intelligenceCategorizationPerceptionBenchmark (surveying)Natural language processingComputer vision

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
book-chapter
Pages
432-448
Citations
1776
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1776
OpenAlex
242
Influential

Cite This

Tete Xiao, Yingcheng Liu, Bolei Zhou et al. (2018). Unified Perceptual Parsing for Scene Understanding. Lecture notes in computer science , 432-448. https://doi.org/10.1007/978-3-030-01228-1_26

Identifiers

DOI
10.1007/978-3-030-01228-1_26
PMID
40873625
PMCID
PMC12378480
arXiv
1807.10221

Data Quality

Data completeness: 79%