Semantic reranking of CRF label sequences for verbal multiword expression identification

Maldonado Guerra, Alfredo; Moreau, Erwan; Vogel, Carl; Alsulaimani, Ashjan; Han, Lifeng; Chowdhury, Koel Dutta

dc.contributor.author	Maldonado Guerra, Alfredo
dc.contributor.author	Moreau, Erwan
dc.contributor.author	Vogel, Carl
dc.contributor.author	Alsulaimani, Ashjan
dc.contributor.author	Han, Lifeng
dc.contributor.author	Chowdhury, Koel Dutta
dc.contributor.editor	S. Markantonatou, C. Ramisch, A. Savary, V. Vincze	en
dc.coverage.temporal	978-3-96110-123-8	en
dc.date.accessioned	2019-12-19T15:13:17Z
dc.date.available	2019-12-19T15:13:17Z
dc.date.issued	2018
dc.date.submitted	2018	en
dc.identifier.citation	Moreau, E., Alsulaimani, A., Maldonado, A.G., Han, L., Vogel, C. & Chowdhury, K.D., Semantic reranking of CRF label sequences for verbal multiword expression identification, S. Markantonatou, C. Ramisch, A. Savary, V. Vincze, Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop, Language Science Press, 2018, 177 - 207	en
dc.identifier.issn	978-3-96110-124-5
dc.identifier.other	Y
dc.description	PUBLISHED	en
dc.description.abstract	Verbal multiword Expressions (VMWE) identification can be addressed successfully as a sequence labelling problem via conditional random fields (CRFs) by returning the one label sequence with maximal probability. This work describes a system that reranks the top 10 most likely CRF candidate VMWE sequences using a decision tree regression model. The reranker aims to operationalise the intuition that a non-compositional MWE can have a different distributional behaviour than that of its constituent words. This is why it uses semantic features based on comparing the context vector of a candidate expression against those of its constituent words. However, not all VMWE are non-compostional, and analysis shows that non-semantic features also play an important role in the behaviour of the reranker. In fact, the analysis shows that the combination of the sequential approach of the CRF component with the context-based approach of the reranker is the main factor of improvement: our reranker achieves a 12% macro-average F1-score improvement on the basic CRF method, as measured using data from PARSEME shared task on VMWE identification.	en
dc.format.extent	177	en
dc.format.extent	207	en
dc.language.iso	en	en
dc.publisher	Language Science Press	en
dc.relation.ispartof	IsPartOf	en
dc.relation.ispartof	IsPartOf	en
dc.relation.uri	http://langsci-press.org/catalog/book/204	en
dc.rights	Y	en
dc.subject	Verbal multiword Expressions	en
dc.subject	Conditional random fields	en
dc.subject	Natural language processing	en
dc.title	Semantic reranking of CRF label sequences for verbal multiword expression identification	en
dc.title.alternative	Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop	en
dc.type	Book Chapter	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/maldona
dc.identifier.peoplefinderurl	http://people.tcd.ie/vogel
dc.identifier.peoplefinderurl	http://people.tcd.ie/moreaue
dc.identifier.rssinternalid	193091
dc.identifier.doi	http://dx.doi.org/10.5281/zenodo.1469559
dc.rights.ecaccessrights	openAccess
dc.relation.doi	10.5281/zenodo.1469527	en
dc.subject.TCDTheme	Digital Engagement	en
dc.subject.TCDTheme	Digital Humanities	en
dc.subject.TCDTag	ARTIFICIAL INTELLIGENCE	en
dc.subject.TCDTag	Computational Linguistics	en
dc.subject.TCDTag	DATA ANALYSIS	en
dc.subject.TCDTag	MACHINE LEARNING	en
dc.subject.TCDTag	Natural Language Processing	en
dc.subject.TCDTag	multi-word expressions	en
dc.subject.TCDTag	text analytics	en
dc.identifier.rssuri	http://langsci-press.org/catalog/view/204/1647/1302-1
dc.identifier.orcid_id	0000-0001-8426-5249
dc.status.accessible	N	en
dc.contributor.sponsor	Science Foundation Ireland (SFI)	en
dc.contributor.sponsorGrantNumber	13/RC/2106	en
dc.identifier.uri	http://hdl.handle.net/2262/91208

Files in this item

Name:: 204-3-1302-1-10-20181025.pdf
Size:: 657.6Kb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.499Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Semantic reranking of CRF label sequences for verbal multiword expression identification

Files in this item

This item appears in the following Collection(s)