Abstract

Siamese network based trackers formulate tracking as convolutional feature cross-correlation between target template and searching region. However, Siamese trackers still have accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of feature from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on four large tracking benchmarks, including OTB2015, VOT2018, UAV123, and LaSOT. Our model will be released to facilitate further studies based on this problem.

Keywords

BitTorrent trackerComputer scienceArtificial intelligenceFeature (linguistics)Tracking (education)Deep learningNetwork architectureResidual neural networkFeature extractionComputer visionEye trackingPattern recognition (psychology)

Related Publications

Publication Info

Year
2019
Type
article
Pages
4277-4286
Citations
2379
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2379
OpenAlex
515
Influential
1991
CrossRef

Cite This

Bo Li, Wei Wu, Qiang Wang et al. (2019). SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 4277-4286. https://doi.org/10.1109/cvpr.2019.00441

Identifiers

DOI
10.1109/cvpr.2019.00441
arXiv
1812.11703

Data Quality

Data completeness: 84%