VideoBERT: A Joint Model for Video and Language Representation Learning
Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube. Whereas most existing approaches lear...