Recurrent Models of Visual Attention

Abstract

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

Keywords

Visual attentionPsychologyCognitive psychologyComputer sciencePerceptionNeuroscience

Affiliated Institutions

Related Publications

Is object localization for free? - Weakly-supervised learning with convolutional neural networks

Maxime Oquab , Léon Bottou , Ivan Laptev +1 more

Successful methods for visual object recognition typically rely on training datasets containing lots of richly annotated images. Detailed image annotation, e.g. by object boundi...

2015 915 citations

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

Anh‐Tu Nguyen , Jason Yosinski , Jeff Clune

Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Giv...

2015 3232 citations

Object class recognition by unsupervised scale-invariant learning

Rob Fergus , Pietro Perona , Andrew Zisserman

We present a method to learn and recognize object class models from unlabeled and unsegmented cluttered scenes in a scale invariant manner. Objects are modeled as flexible const...

2003 2035 citations

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ze Liu , Yutong Lin , Yue Cao +5 more

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer ...

2021 2021 IEEE/CVF International Conferenc... 25813 citations

Generalization of Back propagation to Recurrent and Higher Order Neural Networks

Fernando J. Pineda

A general method for deriving backpropagation algorithms for networks with recurrent and higher order networks is introduced. The propagation of activation in these networks is ...

1987 Neural Information Processing Systems 123 citations

Publication Info

Year: 2014
Type: preprint
Citations: 998
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Recurrent Models of Visual Attention

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

998

OpenAlex

Cite This

APA Style

                            
                                    Volodymyr Mnih, 
                                
                                    Nicolas Heess, 
                                
                                    Alex Graves
                                
                                et al.
                            
                            (2014). 
                            Recurrent Models of Visual Attention. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.1406.6247

Identifiers

DOI: 10.48550/arxiv.1406.6247