Abstract

Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.

Keywords

NISTBLEUComputer sciencen-gramMachine translationArtificial intelligenceNatural language processingEvaluation of machine translationGramProcess (computing)Evaluation methodsMachine learningLanguage modelReliability engineeringProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
2003
Type
article
Volume
1
Pages
71-78
Citations
1573
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1573
OpenAlex

Cite This

Chin-Yew Lin, Eduard Hovy (2003). Automatic evaluation of summaries using N-gram co-occurrence statistics. , 1 , 71-78. https://doi.org/10.3115/1073445.1073465

Identifiers

DOI
10.3115/1073445.1073465