Abstract
ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper discusses the validity of the evaluation method used in the Document Understanding Conference (DUC) and evaluates five different ROUGE metrics: ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU included in the ROUGE summarization evaluation package using data provided by DUC. A comprehensive study of the effects of using single or multiple references and various sample sizes on the stability of the results is also presented.
Keywords
Related Publications
Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corre...
An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition
Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that re...
Exploring the Limits of Transfer Learning with a Unified Text-to-Text\n Transformer
Transfer learning, where a model is first pre-trained on a data-rich task\nbefore being fine-tuned on a downstream task, has emerged as a powerful\ntechnique in natural language...
Publication Info
- Year
- 2004
- Type
- article
- Citations
- 109
- Access
- Closed