Continuous Measurement Scales in Human Evaluation of Machine Translation
Citation:
Yvette Graham, Timothy Baldwin, Alistair Moffat, Justin Zobel, Continuous Measurement Scales in Human Evaluation of Machine Translation, Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, 1/7/13, Association for Computational Linguistics, 2013, 33 - 41Download Item:
Abstract:
We explore the use of continuous rating scales for human evaluation in the context of machine translation evaluation, comparing two assessor-intrinsic quality-control techniques that do not rely on agreement with expert judgments. Experiments employing Amazon's Mechanical Turk service show that quality-control techniques made possible by the use of the continuous scale show dramatic improvements to intra-annotator agreement of up to +0.101 in the kappa coefficient, with inter-annotator agreement increasing by up to +0.144 when additional standardization of scores is applied.
Author's Homepage:
http://people.tcd.ie/ygraham
Author: Graham, Yvette
Other Titles:
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse7th Linguistic Annotation Workshop and Interoperability with Discourse
Publisher:
Association for Computational LinguisticsType of material:
Conference PaperCollections
Availability:
Full text availableMetadata
Show full item recordLicences: