Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods

Hill, Nathan

dc.contributor.author	Hill, Nathan	en
dc.date.accessioned	2023-05-22T07:24:03Z
dc.date.available	2023-05-22T07:24:03Z
dc.date.issued	2021	en
dc.date.submitted	2021	en
dc.identifier.citation	Meelen, Marieke; Roux, �lie; Hill, Nathan W., Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods, ACM Transactions on Asian and Low-Resource Language Information Processing, 20, 1, 2021, 1-11	en
dc.identifier.issn	2375-4699	en
dc.identifier.other	Y	en
dc.description	PUBLISHED	en
dc.description.abstract	This paper presents the new and improved version of the Annotated Corpus of Classical Tibetan (ACTib). These segmented and POS-tagged versions of all available texts in the Buddhist Digital Resource Center (BDRC) were annotated automatically using a memory-based tagger (see Meelen and Hill 2017). While this method had certain clear advantages - large amounts of data could quickly be split into meaningful words and grammatical markers, provided with highly detailed morpho-syntactic labels - the accuracy of these initial results can be improved in various ways. In this paper, we present a thorough error analysis and focus on correcting and improving these results using a combination of optimised memory-based, neural networks and rule-based methods.	en
dc.format.extent	1-11	en
dc.language.iso	en	en
dc.relation.ispartofseries	ACM Transactions on Asian and Low-Resource Language Information Processing	en
dc.relation.ispartofseries	20	en
dc.relation.ispartofseries	1	en
dc.rights	Y	en
dc.title	Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods	en
dc.type	Journal Article	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/hillna	en
dc.identifier.rssinternalid	225646	en
dc.identifier.doi	http://dx.doi.org/10.1145/3409488	en
dc.rights.ecaccessrights	openAccess
dc.identifier.orcid_id	0000-0001-6423-017X	en
dc.identifier.uri	http://hdl.handle.net/2262/102695

Files in this item

Name:: ACTibII.pdf
Size:: 121.0Kb
Format:: PDF
Description:: Accepted for publication (author's ...

View/Open

Name:: license.txt
Size:: 3.217Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Centre for Language and Communication Studies (Scholarly Publications)
CLCS (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods

Files in this item

This item appears in the following Collection(s)