Comparison of Different Clustering Fnsembles by Solution Selection Strategy

3rd International Conference on Computer Science and Engineering (UBMK), Sarajevo, Bosna-Hersek, 20 - 23 Eylül 2018, ss.67-72

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Basıldığı Şehir: Sarajevo
Basıldığı Ülke: Bosna-Hersek
Sayfa Sayıları: ss.67-72
Anahtar Kelimeler: bagging, clustering, ensemble learning, pruning, random subspace, voting
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Clustering ensemble is an effective way of improving the quality of clustering results. However, designing ensembles is a very difficult task because many factors that influence the performance of the ensemble should be considered: the types of clustering algorithms, the parameters of the algorithms (e.g., initialization method, initial seed values), ensemble size, and use of different samples and/or features of the dataset. In this study, eight different clustering ensembles are designed using several clustering algorithms (k-means, expectation maximization, hierarchical, canopy, and farthest first) and compared to each other in terms of accuracy to assess the impact of these factors. Traditionally, all clustering results produced by all ensemble components are used to create the final consensus clustering result. Unfortunately, some clustering solutions are not as good as others and decrease the overall performance. To solve this problem, this paper proposes an accuracy-based solution selection strategy. In the experimental studies, different clustering ensembles by the proposed solution selection strategy were applied on 14 well-known datasets to determine the optimal ensemble design. According to the experimental results, clustering ensemble strategies significantly outperform single clustering models by better discovering the latent patterns in data.