Abstract

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

Keywords

Affine transformationComputer scienceBenchmark (surveying)Artificial intelligenceFeature (linguistics)ComputationTransformation (genetics)Artificial neural networkSimple (philosophy)Layer (electronics)Visual reasoningImage (mathematics)Task (project management)Process (computing)Pattern recognition (psychology)Machine learningAlgorithmMathematics

Affiliated Institutions

Related Publications

Deep Layer Aggregation

Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of featu...

2018 2018 IEEE/CVF Conference on Computer ... 1501 citations

Publication Info

Year
2018
Type
article
Volume
32
Issue
1
Citations
1301
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1301
OpenAlex

Cite This

Ethan Perez, Florian Strub, Harm de Vries et al. (2018). FiLM: Visual Reasoning with a General Conditioning Layer. Proceedings of the AAAI Conference on Artificial Intelligence , 32 (1) . https://doi.org/10.1609/aaai.v32i1.11671

Identifiers

DOI
10.1609/aaai.v32i1.11671