Abstract

Transformer recently has presented encouraging progress in computer vision.\nIn this work, we present new baselines by improving the original Pyramid Vision\nTransformer (PVT v1) by adding three designs, including (1) linear complexity\nattention layer, (2) overlapping patch embedding, and (3) convolutional\nfeed-forward network. With these modifications, PVT v2 reduces the\ncomputational complexity of PVT v1 to linear and achieves significant\nimprovements on fundamental vision tasks such as classification, detection, and\nsegmentation. Notably, the proposed PVT v2 achieves comparable or better\nperformances than recent works such as Swin Transformer. We hope this work will\nfacilitate state-of-the-art Transformer researches in computer vision. Code is\navailable at https://github.com/whai362/PVT.\n

Keywords

TransformerComputer scienceSegmentationEmbeddingArtificial intelligenceComputationComputer visionComputer engineeringAlgorithmEngineeringElectrical engineering

Affiliated Institutions

Related Publications

Publication Info

Year
2022
Type
article
Volume
8
Issue
3
Pages
415-424
Citations
1821
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1821
OpenAlex

Cite This

Wenhai Wang, Enze Xie, Xiang Li et al. (2022). PVT v2: Improved baselines with pyramid vision transformer. Computational Visual Media , 8 (3) , 415-424. https://doi.org/10.1007/s41095-022-0274-8

Identifiers

DOI
10.1007/s41095-022-0274-8