Identifying and interpreting context on the Web : an application-driven approach

Kelleher, Daniel

dc.contributor.advisor	Luz, Saturnino
dc.contributor.author	Kelleher, Daniel
dc.date.accessioned	2019-04-30T09:28:42Z
dc.date.available	2019-04-30T09:28:42Z
dc.date.issued	2008
dc.identifier.citation	Daniel Kelleher, 'Identifying and interpreting context on the Web : an application-driven approach', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2008, pp 217
dc.identifier.other	THESIS 8660
dc.description.abstract	This work describes the use of contextual information in Web document processing. Contextual information is defined as the contents of a 'context set' of a document of interest on the Web. The context set of a document is determined by the hyperlink structure of the Web around the document of interest. This thesis suggests that, as in other text media, contextual information on the Web is a vital component of the information content of a document, and should be taken into account when interpreting or processing that information. Most existing hypertext document processing applications either ignore hyperlinks, or use them in a restrictive manner, to identify a particular form of contextual information. For example, many information retrieval applications use hyperlinks as indicators of document prestige or authority, conferred by the referring document to the referred document. Others use them to locate (or help to locate) similar content in order to augment a document index or provide additional information about a document's relevance. A related method is the clustering of Web documents, using the Web hyperlink structure to identify clusters of related documents, either to generate an aggregate that can be used in document indexing, or to simplify a visual representation of the graph structure in order to aid browsing. These approaches typically apply the information provided by hyperlinks to a particular application, such as information retrieval, or Web browsing. In contrast, this thesis proposes a flexible method for the inclusion of hypertext contextual information in a number of document processing applications, based on an adaptation of a term weighting measure based on the frequency of a term in a set of documents. The resultant non-linear measure can be incorporated into Web applications using a probabilistic model trained on pre-annotated data suitable for the domain of the application. In this work, the measure is implemented and evaluated on a number of Web document content processing applications. Specifically, three applications are presented: an adaptation of an existing automatic keyphrase extraction application, a Web document retrieval ranking algorithm, and a document collection homogeneity measure with a related homogeneous corpus generation application.
dc.format	1 volume
dc.language.iso	en
dc.publisher	Trinity College (Dublin, Ireland). School of Computer Science & Statistics
dc.relation.isversionof	http://stella.catalogue.tcd.ie/iii/encore/record/C__Rb13569088
dc.subject	Computer Science, Ph.D.
dc.subject	Ph.D. Trinity College Dublin
dc.title	Identifying and interpreting context on the Web : an application-driven approach
dc.type	thesis
dc.type.supercollection	thesis_dissertations
dc.type.supercollection	refereed_publications
dc.type.qualificationlevel	Doctoral
dc.type.qualificationname	Doctor of Philosophy (Ph.D.)
dc.rights.ecaccessrights	openAccess
dc.format.extentpagination	pp 217
dc.description.note	TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie
dc.identifier.uri	http://hdl.handle.net/2262/86395

Files in this item

Name:: Kelleher TCD THESIS 8660 Ident ...
Size:: 117.6Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.499Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Show simple item record

Browse

My Account

Identifying and interpreting context on the Web : an application-driven approach

Files in this item

This item appears in the following Collection(s)