dc.contributor.author | Maldonado Guerra, Alfredo | |
dc.contributor.author | Moreau, Erwan | |
dc.contributor.author | Vogel, Carl | |
dc.contributor.author | Alsulaimani, Ashjan | |
dc.contributor.author | Han, Lifeng | |
dc.contributor.author | Chowdhury, Koel Dutta | |
dc.contributor.editor | S. Markantonatou, C. Ramisch, A. Savary, V. Vincze | en |
dc.coverage.temporal | 978-3-96110-123-8 | en |
dc.date.accessioned | 2019-12-19T15:13:17Z | |
dc.date.available | 2019-12-19T15:13:17Z | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018 | en |
dc.identifier.citation | Moreau, E., Alsulaimani, A., Maldonado, A.G., Han, L., Vogel, C. & Chowdhury, K.D., Semantic reranking of CRF label sequences for verbal multiword expression identification, S. Markantonatou, C. Ramisch, A. Savary, V. Vincze, Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop, Language Science Press, 2018, 177 - 207 | en |
dc.identifier.issn | 978-3-96110-124-5 | |
dc.identifier.other | Y | |
dc.description | PUBLISHED | en |
dc.description.abstract | Verbal multiword Expressions (VMWE) identification can be addressed successfully as a sequence labelling problem via conditional random fields (CRFs) by returning the one label sequence with maximal probability. This work describes a
system that reranks the top 10 most likely CRF candidate VMWE sequences using
a decision tree regression model. The reranker aims to operationalise the intuition
that a non-compositional MWE can have a different distributional behaviour than that of its constituent words. This is why it uses semantic features based on comparing the context vector of a candidate expression against those of its constituent
words. However, not all VMWE are non-compostional, and analysis shows that
non-semantic features also play an important role in the behaviour of the reranker.
In fact, the analysis shows that the combination of the sequential approach of the
CRF component with the context-based approach of the reranker is the main factor
of improvement: our reranker achieves a 12% macro-average F1-score improvement
on the basic CRF method, as measured using data from PARSEME shared task on
VMWE identification. | en |
dc.format.extent | 177 | en |
dc.format.extent | 207 | en |
dc.language.iso | en | en |
dc.publisher | Language Science Press | en |
dc.relation.ispartof | IsPartOf | en |
dc.relation.ispartof | IsPartOf | en |
dc.relation.uri | http://langsci-press.org/catalog/book/204 | en |
dc.rights | Y | en |
dc.subject | Verbal multiword Expressions | en |
dc.subject | Conditional random fields | en |
dc.subject | Natural language processing | en |
dc.title | Semantic reranking of CRF label sequences for verbal multiword expression identification | en |
dc.title.alternative | Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop | en |
dc.type | Book Chapter | en |
dc.type.supercollection | scholarly_publications | en |
dc.type.supercollection | refereed_publications | en |
dc.identifier.peoplefinderurl | http://people.tcd.ie/maldona | |
dc.identifier.peoplefinderurl | http://people.tcd.ie/vogel | |
dc.identifier.peoplefinderurl | http://people.tcd.ie/moreaue | |
dc.identifier.rssinternalid | 193091 | |
dc.identifier.doi | http://dx.doi.org/10.5281/zenodo.1469559 | |
dc.rights.ecaccessrights | openAccess | |
dc.relation.doi | 10.5281/zenodo.1469527 | en |
dc.subject.TCDTheme | Digital Engagement | en |
dc.subject.TCDTheme | Digital Humanities | en |
dc.subject.TCDTag | ARTIFICIAL INTELLIGENCE | en |
dc.subject.TCDTag | Computational Linguistics | en |
dc.subject.TCDTag | DATA ANALYSIS | en |
dc.subject.TCDTag | MACHINE LEARNING | en |
dc.subject.TCDTag | Natural Language Processing | en |
dc.subject.TCDTag | multi-word expressions | en |
dc.subject.TCDTag | text analytics | en |
dc.identifier.rssuri | http://langsci-press.org/catalog/view/204/1647/1302-1 | |
dc.identifier.orcid_id | 0000-0001-8426-5249 | |
dc.status.accessible | N | en |
dc.contributor.sponsor | Science Foundation Ireland (SFI) | en |
dc.contributor.sponsorGrantNumber | 13/RC/2106 | en |
dc.identifier.uri | http://hdl.handle.net/2262/91208 | |