Show simple item record

dc.contributor.advisorLuz, Saturnino
dc.contributor.authorKelleher, Daniel
dc.date.accessioned2019-04-30T09:28:42Z
dc.date.available2019-04-30T09:28:42Z
dc.date.issued2008
dc.identifier.citationDaniel Kelleher, 'Identifying and interpreting context on the Web : an application-driven approach', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2008, pp 217
dc.identifier.otherTHESIS 8660
dc.description.abstractThis work describes the use of contextual information in Web document processing. Contextual information is defined as the contents of a 'context set' of a document of interest on the Web. The context set of a document is determined by the hyperlink structure of the Web around the document of interest. This thesis suggests that, as in other text media, contextual information on the Web is a vital component of the information content of a document, and should be taken into account when interpreting or processing that information. Most existing hypertext document processing applications either ignore hyperlinks, or use them in a restrictive manner, to identify a particular form of contextual information. For example, many information retrieval applications use hyperlinks as indicators of document prestige or authority, conferred by the referring document to the referred document. Others use them to locate (or help to locate) similar content in order to augment a document index or provide additional information about a document's relevance. A related method is the clustering of Web documents, using the Web hyperlink structure to identify clusters of related documents, either to generate an aggregate that can be used in document indexing, or to simplify a visual representation of the graph structure in order to aid browsing. These approaches typically apply the information provided by hyperlinks to a particular application, such as information retrieval, or Web browsing. In contrast, this thesis proposes a flexible method for the inclusion of hypertext contextual information in a number of document processing applications, based on an adaptation of a term weighting measure based on the frequency of a term in a set of documents. The resultant non-linear measure can be incorporated into Web applications using a probabilistic model trained on pre-annotated data suitable for the domain of the application. In this work, the measure is implemented and evaluated on a number of Web document content processing applications. Specifically, three applications are presented: an adaptation of an existing automatic keyphrase extraction application, a Web document retrieval ranking algorithm, and a document collection homogeneity measure with a related homogeneous corpus generation application.
dc.format1 volume
dc.language.isoen
dc.publisherTrinity College (Dublin, Ireland). School of Computer Science & Statistics
dc.relation.isversionofhttp://stella.catalogue.tcd.ie/iii/encore/record/C__Rb13569088
dc.subjectComputer Science, Ph.D.
dc.subjectPh.D. Trinity College Dublin
dc.titleIdentifying and interpreting context on the Web : an application-driven approach
dc.typethesis
dc.type.supercollectionthesis_dissertations
dc.type.supercollectionrefereed_publications
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (Ph.D.)
dc.rights.ecaccessrightsopenAccess
dc.format.extentpaginationpp 217
dc.description.noteTARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie
dc.identifier.urihttp://hdl.handle.net/2262/86395


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record