Abstract
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.
Keywords
Affiliated Institutions
Related Publications
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also co...
Towards Learning Terminological Concept Systems from Multilingual Natural Language Text
Terminological Concept Systems (TCS) provide a means of organizing, structuring and representing domain-specific multilingual information and are important to ensure terminologi...
ALBERT: A Lite BERT for Self-supervised Learning of Language\n Representations
Increasing model size when pretraining natural language representations often\nresults in improved performance on downstream tasks. However, at some point\nfurther model increas...
A general approach for developing system‐specific functions to score protein–ligand docked complexes using support vector inductive logic programming
Abstract Despite the increased recent use of protein–ligand and protein–protein docking in the drug discovery process due to the increases in computational power, the difficulty...
Publication Info
- Year
- 2003
- Type
- article
- Volume
- 1
- Pages
- 71-78
- Citations
- 1573
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.3115/1073445.1073465