Efficient implementation and parallelization of fuzzy density based clustering


ATILGAN C., TEZEL B. T., NASİBOĞLU E.

INFORMATION SCIENCES, cilt.575, ss.454-467, 2021 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 575
  • Basım Tarihi: 2021
  • Doi Numarası: 10.1016/j.ins.2021.06.044
  • Dergi Adı: INFORMATION SCIENCES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Computer & Applied Sciences, INSPEC, Library, Information Science & Technology Abstracts (LISTA), Metadex, MLA - Modern Language Association Database, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.454-467
  • Anahtar Kelimeler: Fuzzy neighborhood, Fuzzy clustering, Density based clustering, FN-DBSCAN, High performance computing, DBSCAN ALGORITHM, MR-DBSCAN
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Clustering is a commonly used tool for data management and analysis. One of the prominent group of clustering methods consists of the density-based clustering algo-rithms. The use of fuzzy neighborhood functions for density-based clustering algorithms are known to significantly improve the robustness, such that choosing neighborhood parameters is rather easy for the user. On the other hand, because of the overhead of the fuzzy calculations, they demand higher computing resources. This study discusses how FN-DNSCAN-a fuzzy density-based clustering algorithm-can be implemented efficiently. A rather specific FN-DBSCAN algorithm that adopts techniques used to improve classical density-based clustering algorithms is introduced. Also, a parallel version of the algorithm is proposed and their implementation details are discussed. The proposed algorithms are tested in a set of comparative experiments, along with a straightforward FN-DBSCAN implementation and a curious but unsafe modification of the parallel algorithm. The results of the experiments that are conducted in a modest parallel computing environment of 32 processing units, show a wide variety of differences in relative speed-ups ranging from 2 to 850 times. (c) 2021 Elsevier Inc. All rights reserved.