Show simple item record

dc.contributor.advisorVOGEL, CARL
dc.contributor.authorAlsulaimani, Ashjan
dc.date.accessioned2024-01-02T07:05:31Z
dc.date.available2024-01-02T07:05:31Z
dc.date.issued2023en
dc.date.submitted2024
dc.identifier.citationAlsulaimani, Ashjan, Diachronic Word Sense Induction, Trinity College Dublin, School of Computer Science & Statistics, Computer Science, 2024en
dc.identifier.otherYen
dc.descriptionAPPROVEDen
dc.description.abstractLearning from natural language is one of the great challenges of Natural Language Processing (NLP) and Machine Learning (ML). Word meanings evolve over time and one of the challenges is how to model such dynamic behaviour. The task of Diachronic Word Sense Induction (DWSI) aims at learning the meaning of words across time, i.e. representing the dynamic evolution of a word sense. The word meaning is inferred in an unsupervised manner from time-stamped examples. Many other tasks of NLP can be affected by word sense change such as machine translation, question answering, information retrieval and text classification. This thesis addresses the problem of modelling the meaning changes of ambiguous target words from unlabelled time-stamped text. The modelling techniques used for DWSI rely on Bayesian hierarchical mixture models, and are closely related to Topic Modelling techniques. The random variables of DWSI models are the time Y , the sense S, and the context words W surrounding the target word. The sense is the latent variable in this dynamic story in which a target word acquires new senses and/or lose old senses over time. The existing DWSI models assume that sense changes can be dependent on time, represented either by the multinomial probability distribution over words given sense P(W|S) or the multinomial probability distribution over words given sense and time P(W|S,Y). It is also assumed that the senses proportions change over time, represented by the multinomial probability distribution over sense given time P(S|Y). The main disadvantage of the existing models is that they are parametric, in the sense that the number of senses is a hyperparameter which has to be known a priori. This is not ideal given the nature of the DWSI task, which is meant to infer senses (unobserved variables) from unlabelled data in their optimal representations. For example, one of the parametric methods relies on Dirichlet priors while the other method relies on priors defined as intrinsic Gaussian Markov Random Fields which adds artificial constraint on the estimation of P(W |S,Y) as well as P (S|Y). This thesis also addresses the issue of DWSI evaluation. This is a very challenging problem, since a reliable quantitative evaluation requires a large amount of sense-annotated and time-stamped data. I propose the first quantitative evaluation framework for DWSI, allowing systematic and objective comparisons between models. I introduce a wide range of evaluation measures and a novel method for collecting a gold standard dataset from large sequential collections of documents from scientific publications in a new domain, the biomedical domain. This evaluation framework also allows comparisons between any future Bayesian models related to DWSI. The new evaluation measures are validated on the task, with detailed comparisons between the state of the art (SOA) models showing their respective strengths and weaknesses on various aspects of the task. In particular, the results demonstrate that the complexity of the time dimension with the parametric constraints do not lead to an accurate estimation of the evolution of senses across time. I then propose four DWSI models with different properties and based on Topic Modelling techniques. I advance the SOA by redefining the task by my novel dynamic approach based first on modelling P(W|S) by using the non-parametric priors (hence the time-dependent representations are calculated after estimating P (W |S)) and achieved a new SOA by the new model I designed. Firstly, I investigate into the issue of number of senses and propose models based on hierarchical Dirichlet processes priors. This assumes that an infinite number of senses is possible in theory, allowing a word to be assigned to a new sense during the inference processes. This also assumes that the corpus is subdivided into a set of groups and that the senses are shared among multiple related groups of documents. Such a model has two advantages: first it finds the optimal number of senses during the inference process that fits the data; second it allows a high quality merging when the desired number of senses is known. This thesis assumes that these properties contribute to a more accurate representation of the meaning of the words, in turn leading to better clustering of the senses across time. Indeed, P (S|Y ) is calculated after estimating the models of the two modes: non-parametric and parametric. The results demonstrate that such properties offer a dramatic performance gain compared to the bespoke DWSI models. Another issue that is common and exists with other parametric models is choosing the number of features (words) a priori. This comes with the disadvantages of handling imbalance in word frequency as well as the sparsity in the data introduced by ?one-hot? representations of words. Thus, word embeddings can be another direction of potential improvement as it provides a distributed representation where words in similar meanings are close in a lower-dimensional vector space. Secondly I then investigate into the parametric dynamic models which compute a multinomial probability distribution P (W |S, Y ) as the exponentiated inner product between the word embeddings and per-time sense embeddings. Indeed, the results demonstrate that there is an improvement, when the model is provided with prefitted embeddings rather than with embeddings trained simultaneously. Lastly, I conclude that the hierarchical Dirichlet processes based models show drastically better results/clusters even when they are compared with a model based on a high-quality and domain-specific pretrained embeddings.en
dc.language.isoenen
dc.publisherTrinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Scienceen
dc.rightsYen
dc.titleDiachronic Word Sense Inductionen
dc.typeThesisen
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelDoctoralen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:ALSULAIAen
dc.identifier.rssinternalid260915en
dc.rights.ecaccessrightsopenAccess
dc.identifier.urihttp://hdl.handle.net/2262/104325


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record