K-centroid link: a novel hierarchical clustering linkage method


DOĞAN A., BİRANT D.

APPLIED INTELLIGENCE, cilt.52, sa.5, ss.5537-5560, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 52 Sayı: 5
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1007/s10489-021-02624-8
  • Dergi Adı: APPLIED INTELLIGENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, PASCAL, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, Educational research abstracts (ERA), INSPEC, Library, Information Science & Technology Abstracts (LISTA), zbMATH
  • Sayfa Sayıları: ss.5537-5560
  • Anahtar Kelimeler: Machine learning, Clustering, Hierarchical clustering, Linkage method, DETECTING COMMUNITIES, INTIMATE DEGREE, ALGORITHM, NETWORKS, NEIGHBORHOOD, GRAPH
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

In hierarchical clustering, the most important factor is the selection of the linkage method which is the decision of how the distances between clusters will be calculated. It extremely affects not only the clustering quality but also the efficiency of the algorithm. However, the traditional linkage methods do not consider the effect of the objects around cluster centers. Based on this motivation, in this article, we propose a novel linkage method, named k-centroid link, in order to provide a better solution than the traditional linkage methods. In the proposed k-centroid link method, the dissimilarity between two clusters is mainly defined as the average distance between all pairs of k data objects in each cluster, which are the k closest ones to the centroid of each cluster. In the experimental studies, the proposed method was tested on 24 different publicly available benchmark datasets. The results demonstrate that by hierarchical clustering via the k-centroid link method, it is possible to obtain better performance in terms of clustering quality compared to the conventional linkage methods such as single link, complete link, average link, mean link, centroid link, and the Ward method.