Show simple item record

dc.contributor.advisorUi Dhonnchadha, Elaine
dc.contributor.authorÓ MEACHAIR, MÍCHEÁL JOHN
dc.date.accessioned2020-05-05T08:32:18Z
dc.date.available2020-05-05T08:32:18Z
dc.date.issued2020en
dc.date.submitted2020
dc.identifier.citationÓ MEACHAIR, MÍCHEÁL JOHN, The Creation and Complexity Analysis of a Corpus of Educational Materials in Irish (EduGA), n/a, Trinity College Dublin.School of Linguistic Speech & Comm Sci, 2020en
dc.identifier.otherYen
dc.descriptionAPPROVEDen
dc.description.abstractAbstract This research presents the construction of a 7.5-million word corpus of educational materials for teaching Irish and for teaching other subjects through the medium of Irish. This corpus is called EduGA. The corpus was compiled with a view to developing three complexity metrics for Irish, each of which is based on objective analyses that have been conducted on EduGA. The first analysis focuses on 7 lexico-grammatical language features that have been prescribed for multiple Irish courses. The statistical significance of the each lexico-grammatical feature was then analysed at the level for which it was first prescribed as well as other levels. It was concluded that only some language features occur at a statistically significant frequency at the level for which they were first prescribed, that less than half of the language features analysed could be reliably used to analyse lexical complexity. The second analysis tested the applicability of length-of-word and length-of-sentence analyses to Irish complexity studies. These metrics were chosen because they are commonly used in other languages. Length of word was not found to be a suitable metric for Irish because of the language's morphology. Length of sentence was found to be a suitable for analysing syntactic complexity in Irish texts. In the final analysis a term-based metric was developed in order to analyse semantic complexity. This term-based combines two dimensions in order to contextualise the term usage. The first dimension draws on the topicality of terms in documents, and the second dimension draws on the frequency of terms in a corpus of general Irish. A by-product of the present research was the development of resources, namely, a corpus of Irish-language Educational materials. Wordlists for each subject-specific sub-corpus were included in the appendix to this thesis in order to provide for further research in this area.en
dc.language.isoenen
dc.publisherTrinity College Dublin. School of Linguistic Speech & Comm Sci. C.L.C.S.en
dc.rightsYen
dc.subjectcorpus linguisticsen
dc.subjecteducationen
dc.subjectlanguage learningen
dc.subjectcorporaen
dc.subjectnatural language processingen
dc.subjectgaeilgeen
dc.titleThe Creation and Complexity Analysis of a Corpus of Educational Materials in Irish (EduGA)en
dc.title.alternativen/aen
dc.typeThesisen
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelDoctoralen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:OMEACHAMen
dc.identifier.rssinternalid215340en
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorAn Chomhairle um Oideachas Gaeltachta agus Gealscola?ochtaen
dc.identifier.urihttp://hdl.handle.net/2262/92421


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record