Abstract

Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Architectural efforts are exploring many dimensions for network backbones, designing deeper or wider architectures, but how to best aggregate layers and blocks across a network deserves further attention. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. We augment standard architectures with deeper aggregation to better fuse information across layers. Our deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes.

Keywords

Fuse (electrical)Computer scienceMerge (version control)Layer (electronics)Aggregate (composite)Artificial intelligenceInferenceDeep learningNetwork architectureInformation retrievalComputer networkEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Citations
1501
Access
Closed

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1501
OpenAlex
189
Influential
1143
CrossRef

Cite This

Fisher Yu, Dequan Wang, Evan Shelhamer et al. (2018). Deep Layer Aggregation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . https://doi.org/10.1109/cvpr.2018.00255

Identifiers

DOI
10.1109/cvpr.2018.00255
arXiv
1707.06484

Data Quality

Data completeness: 79%