Browsing Computer Science (Scholarly Publications) by Subject "Natural Language Processing"
Now showing items 1-16 of 16
-
Analysis and Insights from the PARSEME Shared Task dataset
(Language Science Press, 2018)The PARSEME Shared Task on the automatic identification of verbal multiword expressions (VMWEs) was the first collaborative study on the subject to cover a wide and diverse range of languages. One observation that emerged ... -
Assessing Human-Parity in Machine Translation on the Segment Level
(Association for Computational Linguistics, 2020)Recent machine translation shared tasks have shown top-performing systems to tie or in some cases even outperform human translation. Such conclusions about system and human performance are, however, based on estimates ... -
Automatic Extraction of Data Governance Knowledge from Slack Chat Channels
(2018)This paper describes a data governance knowledge extraction prototype for Slack channels based on an OWL ontology abstracted from the Collibra data governance operating model and the application of statistical techniques ... -
C-HTS: A Concept-based Hierarchical Text Segmentation Approach
(2018)Hierarchical Text Segmentation is the task of building a hierarchical structure out of text to reflect its sub-topic hierarchy. Current text segmentation approaches are based upon using lexical and/or syntactic similarity ... -
Findings of the 2021 Conference on Machine Translation (WMT21)
(Association for Computational Linguistics, 2021)This paper presents the results of the news translation task, the multilingual low-resource translation for Indo-European languages, the triangular translation task, and the automatic post-editing task organised as part ... -
The Impact of Training Data Bias on Automatic Generation of Video Captions
(2019)A major issue in machine learning is availability of training data. While this historically referred to the availability of a sufficient volume of training data, recently this has shifted to the availability of sufficient ... -
Improving Document-level Sentiment Analysis with User and Product Context
(Association for Computational Linguistics, 2020)Past work that improves document-level sentiment analysis by encoding user and product information has been limited to considering only the text of the current review. We investigate incorporating additional review text ... -
Improving Unsupervised Question Answering via Summarization-Informed Question Generation
(Association for Computational Linguistics, 2021)Question Generation (QG) is the task of generating a plausible question for a given <passage, answer> pair. Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, ... -
PyThaiNLP: Thai Natural Language Processing in Python
(Empirical Methods in Natural Language Processing, 2023)We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first ... -
Semantic reranking of CRF label sequences for verbal multiword expression identification
(Language Science Press, 2018)Verbal multiword Expressions (VMWE) identification can be addressed successfully as a sequence labelling problem via conditional random fields (CRFs) by returning the one label sequence with maximal probability. This work ... -
A Semi-Automatic Indexing System for Cell Images
(IEEE, 2008)A method is described that can be used for annotating and indexing an arbitrary set of images with texts collateral to the images. The collateral texts comprise digitised texts, e.g. journal papers and newspapers in which ... -
Statistical Power and Translationese in Machine Translation Evaluation
(Association for Computational Linguistics, 2020)The term translationese has been used to describe features of translated text, and in this paper, we provide detailed analysis of potential adverse effects of translationese on machine translation evaluation. Our analysis ... -
Stylochronometry: Timeline Prediction in Stylometric Analysis
(Springer, 2015)We examine stylochronometry, the question of measuring change in linguistic style over time within an authorial canon and in relation to change in language in general use over a contemporaneous period. We take the works ... -
The Third Multilingual Surface Realisation Shared Task (SR?20): Overview and Evaluation Results
(2020)This paper presents results from the Third Shared Task on Multilingual Surface Realisation (SR’20) which was organised as part of the COLING’20 Workshop on Multilingual Surface Realisation. As in SR’18 and SR’19, the shared ... -
Towards efficient string processing of annotated events
(2017)This paper explores the use of strings as models to effectively represent event data such as might be found in a document annotated with ISO-TimeML. We describe the translation of such data to strings, as well as a number ...