Abstract

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. Code has been made available at: https://github.com/facebookresearch/SlowFast.

Keywords

Computer scienceFrame rateAction recognitionCode (set theory)Artificial intelligenceFrame (networking)Semantics (computer science)Computer visionPattern recognition (psychology)Computer networkProgramming language

Affiliated Institutions

Related Publications

Non-local Neural Networks

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family...

2018 2018 IEEE/CVF Conference on Computer ... 10740 citations

Publication Info

Year
2019
Type
article
Citations
3322
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

3322
OpenAlex

Cite This

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik et al. (2019). SlowFast Networks for Video Recognition. . https://doi.org/10.1109/iccv.2019.00630

Identifiers

DOI
10.1109/iccv.2019.00630