Linear transformations of semantic spaces for word-sense discrimination and collocation compositionality grading
Citation:
Alfredo. Maldonado Guerra, 'Linear transformations of semantic spaces for word-sense discrimination and collocation compositionality grading', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2015, pp 184Download Item:
Abstract:
Latent Semantic Analysis (LSA) and Word Space are two semantic models derived from the
vector space model of distributional semantics that have been used successfully in word-sense
disambiguation and discrimination. LSA can represent word types and word tokens in con-
text by means of a single matrix factorised by Singular Value Decomposition (SVD). Word
Space is able to represent types via word vectors and tokens through two separate kinds of
context vectors: direct vectors that count first-order word co-occurrence and indirect vec-
tors that capture second-order co-occurrence. Word Space objects are optionally reduced by
SVD. Whilst being regarded as related, little has been discussed about the specific relation-
ship between Word Space and LSA or the benefits of one model over the other, especially with
regard to their capability of representing word tokens. This thesis aims to address this both
theoretically and empirically.
Within the theoretical focus, the definitions of Word Space and LSA as presented in the
literature are studied. A formalisation of these two semantic models is presented and their
theoretical properties and relationships are discussed. A fundamental insight from this theor-
etical analysis is that indirect (second-order) vectors can be computed from direct (first-order)
vectors through a linear transformation involving a matrix of word vectors (a word matrix),
an operation that can itself be seen as a method of dimensionality reduction alternative to
SVD. Another finding is that in their unreduced form, LSA vectors and the Word Space dir-
ect (first-order) context vectors define approximately the same objects and their difference
can be exactly calculated. It is also found that the SVD spaces produced by LSA and the
Word Space word vectors are also similar and their difference, which can also be precisely
calculated, ultimately stems from the original difference between unreduced LSA vectors and
Word Space direct vectors. It is also observed that the indirect “second-order” method of
token representation from Word Space is also available to LSA, in a version of the representa-
tion that has remained largely unexplored. And given the analysis of the SVD spaces produced
by both models, it is hypothesised that, when exploited in comparable ways, Word Space and
LSA should perform similarly in actual word-sense disambiguation and discrimination exper-
iments.
In the empirical focus, performance comparisons between different configurations of LSA
and Word Space are conducted in actual word-sense disambiguation and discrimination ex-
periments. It is found that some indirect configurations of LSA and Word Space do indeed
perform similarly, but other LSA and Word Space indirect configurations as well as their dir-
ect representations perform more differently. So, whilst the two models define approximately the same spaces, their differences are large enough to impact performance. Word Space’s sim-
pler, unreduced direct (first-order) context vectors are found to offer the best overall trade off
between accuracy and computational expense. Another empirical exercise involves comparis-
ons of geometric properties of Word Space’s two token vector representations aimed at testing
their similarity and predicting their performance in means-based word-sense disambiguation
and discrimination experiments. It is found that they are not geometrically similar and that
sense vectors computed from direct vectors are more spread than those computed from indir-
ect vectors. Word-sense disambiguation and discrimination experiments performed on these
vectors largely reflect the geometric comparisons as the more spread direct vectors perform
better than indirect vectors in supervised disambiguation experiments, although in unsuper-
vised discrimination experiments, no clear winner emerges. The role of the Word Space word
matrix as a dimensionality reduction operator is also explored. Instead of simply truncating
the word matrix, a method in which dimensions representing statistically associated word
pairs are summed and merged, called word matrix consolidation, is proposed. The method
achieves modest but promising results comparable to SVD. Finally, the word vectors from
Word Space are tested empirically in a task designed to grade (measure) the compositionality
(or degree of “literalness”) of multi-word expressions (MWEs). Cosine similarity measures
are taken between a word vector representing the full MWE, and word vectors represent-
ing each of its individual member words in order to measure the deviation in co-occurrence
distribution between the MWE and its individual members. It is found that this deviation
in co-occurrence distributions does correlate with human compositionality judgements of
MWEs.
Author: Maldonado Guerra, Alfredo.
Advisor:
Emms, MartinPublisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections
Availability:
Full text availableMetadata
Show full item recordLicences: