Abstract

We describe the WIKIQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question.WIKIQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset.In addition, the WIKIQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system.We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WIKIQA dataset.

Keywords

Open domainQuestion answeringComputer scienceDomain (mathematical analysis)Information retrievalArtificial intelligenceMathematics

Affiliated Institutions

Related Publications

Learning question classifiers

In order to respond correctly to a free form factual question given a large collection of texts, one needs to understand the question to a level that allows determining some of ...

2002 1260 citations

Publication Info

Year
2015
Type
article
Citations
839
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

839
OpenAlex
145
Influential
336
CrossRef

Cite This

Yi Yang, Wen-tau Yih, Christopher Meek (2015). WikiQA: A Challenge Dataset for Open-Domain Question Answering. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . https://doi.org/10.18653/v1/d15-1237

Identifiers

DOI
10.18653/v1/d15-1237

Data Quality

Data completeness: 81%