Abstract

Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes. © 2017 IEEE.

Keywords

Pascal (unit)Computer scienceParsingArtificial intelligencePyramid (geometry)PoolingExploitContext (archaeology)Benchmark (surveying)Object detectionVocabularyComputer visionNatural language processingPattern recognition (psychology)Programming languageCartography

Affiliated Institutions

Related Publications

Publication Info

Year
2017
Type
article
Pages
6230-6239
Citations
14618
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

14618
OpenAlex

Cite This

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi et al. (2017). Pyramid Scene Parsing Network. , 6230-6239. https://doi.org/10.1109/cvpr.2017.660

Identifiers

DOI
10.1109/cvpr.2017.660