Efficient Prediction-Based Validation for Document Clustering
![Thumbnail](/themes/edepositireland/images/white_rectangle.jpeg)
File Type:
PDFItem Type:
Technical ReportDate:
2006-05-02Citation:
Greene, Derek; Cunningham, Padraig. 'Efficient Prediction-Based Validation for Document Clustering'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2006-22, 2006, pp17Download Item:
Abstract:
Recently, stability-based techniques have emerged as a very
promising solution to the problem of cluster validation. An inherent
drawback of these approaches is the computational cost of generating
and assessing multiple clusterings of the data. In this paper we present
an efficient prediction-based validation approach suitable for application
to large, high-dimensional datasets such as text corpora. We use kernel
clustering to isolate the validation procedure from the original data.
Furthermore, we employ a prototype reduction strategy that allows us to
work on a reduced kernel matrix, leading to significant computational
savings. To ensure that this condensed representation accurately reflects
the cluster structures in the data, we propose a density-biased selection
strategy. This novel validation process is evaluated on a large number
of real and artificial datasets, where it is shown to consistently produce
good estimates for the optimal number of clusters.
Author: Greene, Derek; Cunningham, Padraig
Publisher:
Trinity College Dublin, Department of Computer ScienceType of material:
Technical ReportCollections
Series/Report no:
Computer Science Technical ReportTCD-CS-2006-22
Availability:
Full text availableKeywords:
Computer ScienceMetadata
Show full item recordLicences: