Abstract

top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. In this paper, we propose a novel and flexible backbone framework, namely CBNet, to construct high-performance detectors using existing open-source pre-trained backbones under the pre-training fine-tuning paradigm. In particular, CBNet architecture groups multiple identical backbones, which are connected through composite connections. Specifically, it integrates the high- and low-level features of multiple identical backbone networks and gradually expands the receptive field to more effectively perform object detection. We also propose a better training strategy with auxiliary supervision for CBNet-based detectors. CBNet has strong generalization capabilities for different backbones and head designs of the detector architecture. Without additional pre-training of the composite backbone, CBNet can be adapted to various backbones (i.e., CNN-based vs. Transformer-based) and head designs of most mainstream detectors (i.e., one-stage vs. two-stage, anchor-based vs. anchor-free-based). Experiments provide strong evidence that, compared with simply increasing the depth and width of the network, CBNet introduces a more efficient, effective, and resource-friendly way to build high-performance backbone networks. Particularly, our CB-Swin-L achieves 59.4% box AP and 51.6% mask AP on COCO test-dev under the single-model and single-scale testing protocol, which are significantly better than the state-of-the-art results (i.e., 57.7% box AP and 50.2% mask AP) achieved by Swin-L, while reducing the training time by 6×. With multi-scale testing, we push the current best single model result to a new record of 60.1% box AP and 52.3% mask AP without using extra training data. Code is available at https://github.com/VDIGPKU/CBNetV2.

Keywords

Backbone networkComputer scienceDetectorObject detectionConstruct (python library)Artificial intelligencePattern recognition (psychology)Computer networkTelecommunications

Affiliated Institutions

Related Publications

Publication Info

Year
2022
Type
article
Volume
31
Pages
6893-6906
Citations
153
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

153
OpenAlex

Cite This

Tingting Liang, Xiaojie Chu, Yudong Liu et al. (2022). CBNet: A Composite Backbone Network Architecture for Object Detection. IEEE Transactions on Image Processing , 31 , 6893-6906. https://doi.org/10.1109/tip.2022.3216771

Identifiers

DOI
10.1109/tip.2022.3216771