Abstract

Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for tasks such as semantic scene segmentation, object part segmentation, and object classification. Our Point Transformer design improves upon prior work across domains and tasks. For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70.4% on Area 5, outperforming the strongest prior model by 3.3 absolute percentage points and crossing the 70% mIoU threshold for the first time. © 2021 IEEE

Keywords

Computer scienceSegmentationArtificial intelligencePoint cloudTransformerImage segmentationComputer visionObject detectionPoint (geometry)Pattern recognition (psychology)EngineeringMathematics

Affiliated Institutions

Related Publications

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification...

2022 2022 IEEE/CVF Conference on Computer ... 5683 citations

Publication Info

Year
2021
Type
article
Pages
16239-16248
Citations
1821
Access
Closed

External Links

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1821
OpenAlex

Cite This

Hengshuang Zhao, Li Jiang, Jiaya Jia et al. (2021). Point Transformer. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , 16239-16248. https://doi.org/10.1109/iccv48922.2021.01595

Identifiers

DOI
10.1109/iccv48922.2021.01595