Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance

Tüysüzoğlu, GÖKSU; Doğan, YUNUS; Kiyak, ELİFE; Ersahin, Mustafa; Ghasemkhani, Bita; Birant, KÖKTEN; Birant, DERYA

doi:10.1109/access.2025.3580290

Joint Tomek Links (JTL): An Innovative Approach to Noise Reduction for Enhanced Classification Performance

Tüysüzoğlu G., Doğan Y., Kiyak E., Ersahin M., Ghasemkhani B., Birant K. U., ...Daha Fazla

IEEE ACCESS, cilt.13, ss.123059-123082, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 13
Basım Tarihi: 2025
Doi Numarası: 10.1109/access.2025.3580290
Dergi Adı: IEEE ACCESS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.123059-123082
Anahtar Kelimeler: Noise, Accuracy, Noise measurement, Machine learning, Random forests, Classification algorithms, Nearest neighbor methods, Data mining, Support vector machines, Training, Artificial intelligence, classification, data mining, machine learning, noise reduction, Tomek links
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Noisy data is a prevalent issue in data mining, significantly impacting the performance of classification algorithms. Mathematical methods are crucial in tackling this obstacle, particularly in optimizing noise detection and data preprocessing. This study proposes a novel approach-Joint Tomek Links (JTL)- to identify and eliminate noisy instances by detecting pairs of nearest neighbors from different classes. It first finds the Tomek links and then refines a probabilistic method to determine which instance from a pair will be removed. In our approach, a random tree classifier serves as the base model. We conducted experiments on 40 benchmark datasets spanning various domains, achieving an average classification accuracy of 83.26% for JTL. The results demonstrate that the JTL attains an average improvement of 5.33% in accuracy compared to the original classification with a random tree. Furthermore, JTL surpasses existing techniques, delivering a noteworthy gain in accuracy by 12.30% on the same datasets. These findings underscore the effectiveness of JTL in enhancing data quality and boosting classification performance in data mining tasks.