Continuous Measurement Scales in Human Evaluation of Machine Translation

Graham, Yvette

This item is covered by a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationa. Click to find out more

File Type:

PDF

Item Type:

Conference Paper

Date:

2013

Author:

Graham, Yvette

Access:

openAccess

Citation:

Yvette Graham, Timothy Baldwin, Alistair Moffat, Justin Zobel, Continuous Measurement Scales in Human Evaluation of Machine Translation, Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, 1/7/13, Association for Computational Linguistics, 2013, 33 - 41

Download Item:

(cmsmt.pdf) 409.9Kb

Abstract:

We explore the use of continuous rating scales for human evaluation in the context of machine translation evaluation, comparing two assessor-intrinsic quality-control techniques that do not rely on agreement with expert judgments. Experiments employing Amazon's Mechanical Turk service show that quality-control techniques made possible by the use of the continuous scale show dramatic improvements to intra-annotator agreement of up to +0.101 in the kappa coefficient, with inter-annotator agreement increasing by up to +0.144 when additional standardization of scores is applied.

URI:

https://www.aclweb.org/anthology/W13-2305
http://hdl.handle.net/2262/96112

Author's Homepage:

http://people.tcd.ie/ygraham

Author: Graham, Yvette

Other Titles:

Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse
7th Linguistic Annotation Workshop and Interoperability with Discourse

Publisher:

Association for Computational Linguistics

Type of material:

Conference Paper

URI:

https://www.aclweb.org/anthology/W13-2305
http://hdl.handle.net/2262/96112

Collections

Availability:

Full text available

Metadata

Show full item record

The following license files are associated with this item:

Original License

Browse

My Account