Abstract

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper discusses the validity of the evaluation method used in the Document Understanding Conference (DUC) and evaluates five different ROUGE metrics: ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU included in the ROUGE summarization evaluation package using data provided by DUC. A comprehensive study of the effects of using single or multiple references and various sample sizes on the stability of the results is also presented.

Keywords

ROUGEAutomatic summarizationIdeal (ethics)Word (group theory)Computer scienceArtificial intelligenceNatural language processingMathematicsPhilosophy

Related Publications

Publication Info

Year
2004
Type
article
Citations
109
Access
Closed

External Links

Citation Metrics

109
OpenAlex

Cite This

Chin-Yew Lin (2004). Looking for a Few Good Metrics: ROUGE and its Evaluation. .