Abstract

Over the last decade, Convolutional Neural Network (CNN) models have been\nhighly successful in solving complex vision problems. However, these deep\nmodels are perceived as "black box" methods considering the lack of\nunderstanding of their internal functioning. There has been a significant\nrecent interest in developing explainable deep learning models, and this paper\nis an effort in this direction. Building on a recently proposed method called\nGrad-CAM, we propose a generalized method called Grad-CAM++ that can provide\nbetter visual explanations of CNN model predictions, in terms of better object\nlocalization as well as explaining occurrences of multiple object instances in\na single image, when compared to state-of-the-art. We provide a mathematical\nderivation for the proposed method, which uses a weighted combination of the\npositive partial derivatives of the last convolutional layer feature maps with\nrespect to a specific class score as weights to generate a visual explanation\nfor the corresponding class label. Our extensive experiments and evaluations,\nboth subjective and objective, on standard datasets showed that Grad-CAM++\nprovides promising human-interpretable visual explanations for a given CNN\narchitecture across multiple tasks including classification, image caption\ngeneration and 3D action recognition; as well as in new settings such as\nknowledge distillation.\n

Keywords

Convolutional neural networkArtificial intelligenceComputer scienceFeature (linguistics)Class (philosophy)Image (mathematics)Object (grammar)Deep learningPattern recognition (psychology)Feature extractionContextual image classificationMachine learning

Affiliated Institutions

Related Publications

Fast R-CNN

This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposa...

2015 2015 IEEE International Conference on... 26511 citations

Fast R-CNN

This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposa...

2015 arXiv (Cornell University) 1766 citations

Publication Info

Year
2018
Type
preprint
Citations
2688
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2688
OpenAlex
287
Influential
2158
CrossRef

Cite This

Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader et al. (2018). Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . https://doi.org/10.1109/wacv.2018.00097

Identifiers

DOI
10.1109/wacv.2018.00097
arXiv
1710.11063

Data Quality

Data completeness: 84%