Abstract

The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.

Keywords

Computer scienceFocus (optics)Artificial intelligenceBenchmark (surveying)Convolution (computer science)Feature (linguistics)SegmentationObject detectionConvolutional neural networkFeature extractionPattern recognition (psychology)Artificial neural networkObject (grammar)Computer visionImage segmentationImage (mathematics)

Affiliated Institutions

Related Publications

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification...

2022 2022 IEEE/CVF Conference on Computer ... 5683 citations

Publication Info

Year
2019
Type
preprint
Pages
9300-9308
Citations
2431
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2431
OpenAlex

Cite This

Xizhou Zhu, Han Hu, Stephen Lin et al. (2019). Deformable ConvNets V2: More Deformable, Better Results. , 9300-9308. https://doi.org/10.1109/cvpr.2019.00953

Identifiers

DOI
10.1109/cvpr.2019.00953