Comparison of Different Clustering Ensembles by Solution Selection Strategy


Tuysuzglu G., Birant D.

3rd International Conference on Computer Science and Engineering, UBMK 2018, Sarajevo, Bosna-Hersek, 20 - 23 Eylül 2018, ss.67-72 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/ubmk.2018.8566542
  • Basıldığı Şehir: Sarajevo
  • Basıldığı Ülke: Bosna-Hersek
  • Sayfa Sayıları: ss.67-72
  • Anahtar Kelimeler: bagging, clustering, ensemble learning, pruning, random subspace, voting
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

© 2018 IEEE.Clustering ensemble is an effective way of improving the quality of clustering results. However, designing ensembles is a very difficult task because many factors that influence the performance of the ensemble should be considered: The types of clustering algorithms, the parameters of the algorithms (e.g., initialization method, initial seed values), ensemble size, and use of different samples and/or features of the dataset. In this study, eight different clustering ensembles are designed using several clustering algorithms (k-means, expectation maximization, hierarchical, canopy, and farthest first) and compared to each other in terms of accuracy to assess the impact of these factors. Traditionally, all clustering results produced by all ensemble components are used to create the final consensus clustering result. Unfortunately, some clustering solutions are not as good as others and decrease the overall performance. To solve this problem, this paper proposes an accuracy-based solution selection strategy. In the experimental studies, different clustering ensembles by the proposed solution selection strategy were applied on 14 well-known datasets to determine the optimal ensemble design. According to the experimental results, clustering ensemble strategies significantly outperform single clustering models by better discovering the latent patterns in data.