Assessing Human-Parity in Machine Translation on the Segment Level

Graham, Yvette

This item is covered by a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationa. Click to find out more

File Type:

PDF

Item Type:

Conference Paper

Date:

2020

Author:

Graham, Yvette

Access:

openAccess

Citation:

Yvette Graham, Christian Federmann, Maria Eskevich, Barry Haddow, Assessing Human-Parity in Machine Translation on the Segment Level, Findings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP - Findings), Virtual, 16/11/20, Association for Computational Linguistics, 2020, 4199 - 4207

Download Item:

(2020.findings-emnlp.375.pdf) 1.444Mb

Abstract:

Recent machine translation shared tasks have shown top-performing systems to tie or in some cases even outperform human translation. Such conclusions about system and human performance are, however, based on estimates aggregated from scores collected over large test sets of translations and unfortunately leave some remaining questions unanswered. For instance, simply because a system significantly outperforms the human translator on average may not necessarily mean that it has done so for every translation in the test set. Firstly, are there remaining source segments present in evaluation test sets that cause significant challenges for top-performing systems and can such challenging segments go unnoticed due to the opacity of current human evaluation procedures? To provide insight into these issues we carefully inspect the outputs of top-performing systems in the most recent WMT-19 news translation shared task for all language pairs in which a system either tied or outperformed human translation. Our analysis provides a new method of identifying the remaining segments for which either machine or human perform poorly. For example, in our close inspection of WMT-19 English to German and German to English we discover the segments that disjointly proved a challenge for human and machine. For English to Russian, there were no segments included in our sample of translations that caused a significant challenge for the human translator, while we again identify the set of segments that caused issues for the top-performing system.

URI:

http://hdl.handle.net/2262/95918

Sponsor

Grant Number

SFI stipend

13/RC/2106

Author's Homepage:

http://people.tcd.ie/ygraham

Description:

PUBLISHED
Virtual

Author: Graham, Yvette

Other Titles:

Findings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP - Findings)

Publisher:

Association for Computational Linguistics

Type of material:

Conference Paper

URI:

http://hdl.handle.net/2262/95918

Collections

Availability:

Full text available

Subject (TCD):

International Integration , ARTIFICIAL INTELLIGENCE , Machine Translation , Natural Language Processing

DOI:

http://dx.doi.org/10.18653/v1/2020.findings-emnlp.375

Metadata

Show full item record

Licences:

Original License

Browse

My Account