English Machine Reading Comprehension Datasets: A Survey

Dzendzik, Daria; Vogel, Carl; Foster, Jennifer

dc.contributor.author	Dzendzik, Daria
dc.contributor.author	Vogel, Carl
dc.contributor.author	Foster, Jennifer
dc.date.accessioned	2021-12-09T14:22:57Z
dc.date.available	2021-12-09T14:22:57Z
dc.date.issued	2021
dc.date.submitted	2021	en
dc.identifier.citation	Daria Dzendzik, Carl Vogel, Jennifer Foster, 'English Machine Reading Comprehension Datasets: A Survey', Association for Computational Linguistics, 2021	en
dc.identifier.other	Y
dc.description	PUBLISHED	en
dc.description.abstract	This paper surveys 60 English Machine Reading Comprehension datasets, with a view to providing a convenient resource for other researchers interested in this problem. We categorize the datasets according to their question and answer form and compare them across various dimensions including size, vocabulary, data source, method of creation, human performance level, and first question word. Our analysis reveals that Wikipedia is by far the most common data source and that there is a relative lack of why, when, and where questions across datasets.	en
dc.format.extent	8784-8804	en
dc.language.iso	en	en
dc.publisher	Association for Computational Linguistics	en
dc.rights	Y	en
dc.subject	Machine reading comprehension	en
dc.subject	English language	en
dc.subject	Data sources	en
dc.title	English Machine Reading Comprehension Datasets: A Survey	en
dc.title.alternative	Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing	en
dc.type	Conference Paper	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/vogel
dc.identifier.rssinternalid	235461
dc.rights.ecaccessrights	openAccess
dc.subject.TCDTheme	Creative Technologies	en
dc.subject.TCDTheme	Digital Engagement	en
dc.subject.TCDTheme	Digital Humanities	en
dc.subject.TCDTag	Computational Linguistics	en
dc.subject.TCDTag	Computational linguistics	en
dc.subject.TCDTag	Question Answering	en
dc.subject.TCDTag	computational linguistics	en
dc.subject.TCDTag	text analytics	en
dc.identifier.orcid_id	0000-0001-8928-8546
dc.contributor.sponsor	Science Foundation Ireland (SFI)	en
dc.contributor.sponsorGrantNumber	13/RC/2106	en
dc.identifier.uri	https://aclanthology.org/2021.emnlp-main.693.pdf
dc.identifier.uri	http://hdl.handle.net/2262/97682

Files in this item

Name:: 2021.emnlp-main.693.pdf
Size:: 1.225Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.424Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

English Machine Reading Comprehension Datasets: A Survey

Files in this item

This item appears in the following Collection(s)