Efficient Ensemble Methods for Document Clustering

Greene, Derek; Cunningham, Padraig

dc.contributor.author	Greene, Derek
dc.contributor.author	Cunningham, Padraig
dc.date.accessioned	2006-10-26T16:04:55Z
dc.date.available	2006-10-26T16:04:55Z
dc.date.issued	2006-08-18
dc.identifier.citation	Efficient Ensemble Methods for Document Clustering, Derek Greene and Padraig Cunningham, 18 August 2006,TCD-CS-2006-48	en
dc.description.abstract	Recent ensemble clustering techniques have been shown to be effective in improving the accuracy and stability of standard clustering algorithms. However, an inherent drawback of these techniques is the computational cost of generating and combining multiple clusterings of the data. In this paper, we present an efficient kernel-based ensemble clustering method suitable for application to large, high-dimensional datasets such as text corpora. To decrease the time required to generate the ensemble members, we employ a prototype reduction scheme that makes use of a density-biased selection strategy to construct a smaller kernel matrix that represents a good proxy for the original data. Evaluations performed on text data demonstrate that this process leads to a significant decrease in running time, while maintaining high clustering accuracy.	en
dc.format.extent	198433 bytes
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	Department of Computer Science, Trinity College Dublin	en
dc.relation.ispartofseries	Computer Science Department, Technical Report	en
dc.relation.ispartofseries	TCD-CS-2006-48	en
dc.subject	Document clustering	en
dc.title	Efficient Ensemble Methods for Document Clustering	en
dc.type	Technical Report	en
dc.identifier.rssuri	https://www.cs.tcd.ie/publications/tech-reports/reports.06/TCD-CS-2006-48.pdf
dc.identifier.uri	http://hdl.handle.net/2262/2418

Files in this item

Name:: TCD-CS-2006-48.pdf
Size:: 193.7Kb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.331Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Efficient Ensemble Methods for Document Clustering

Files in this item

This item appears in the following Collection(s)