Results of the WMT17 Metrics Shared Task
Citation:
Bojar, Ondrej and Graham, Yvette and Kamran, Amir, Results of the WMT17 Metrics Shared Task, Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers, Second Conference on Machine Translation, Copenhagen, Denmark, 1/9/17, Association for Computational Linguistics, 2017, 489-513Download Item:
Abstract:
This paper presents the results of the WMT17 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in the WMT17 news translation task and Neural MT training task. We collected scores of 14 metrics from 8 research groups. In addition to that, we computed scores of 7 standard metrics (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines. The collected scores were evaluated in terms of system-level correlation (how well each metric’s scores correlate with WMT17 official manual ranking of systems) and in terms of segment level correlation (how often a metric agrees with humans in judging the quality of a particular sentence). This year, we build upon two types of manual judgements: direct assessment (DA) and HUME manual semantic judgements.
Author's Homepage:
http://people.tcd.ie/ygrahamDescription:
PUBLISHEDCopenhagen, Denmark
Author: Graham, Yvette
Other Titles:
Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task PapersSecond Conference on Machine Translation
Publisher:
Association for Computational LinguisticsType of material:
Conference PaperCollections
Availability:
Full text availableKeywords:
Machine LearningDOI:
10.18653/v1/W17-4755Metadata
Show full item recordLicences: