Hiding Sensitive Itemsets Using Sibling Itemset Constraints

YILDIZ, BARIŞ; KUT, RECEP; YILMAZ, REYAT

doi:10.3390/sym14071453

Hiding Sensitive Itemsets Using Sibling Itemset Constraints

YILDIZ B., KUT R. A., YILMAZ R.

SYMMETRY-BASEL, cilt.14, sa.7, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14 Sayı: 7
Basım Tarihi: 2022
Doi Numarası: 10.3390/sym14071453
Dergi Adı: SYMMETRY-BASEL
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Communication Abstracts, INSPEC, Metadex, zbMATH, Directory of Open Access Journals, Civil Engineering Abstracts
Anahtar Kelimeler: frequent itemset mining, privacy-preserving data mining, sensitive itemset hiding
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Data collection and processing progress made data mining a popular tool among organizations in the last decades. Sharing information between companies could make this tool more beneficial for each party. However, there is a risk of sensitive knowledge disclosure. Shared data should be modified in such a way that sensitive relationships would be hidden. Since the discovery of frequent itemsets is one of the most effective data mining tools that firms use, privacy-preserving techniques are necessary for continuing frequent itemset mining. There are two types of approaches in the algorithmic nature: heuristic and exact. This paper presents an exact itemset hiding approach, which uses constraints for a better solution in terms of side effects and minimum distortion on the database. This distortion creates an asymmetric relation between the original and the sanitized database. To lessen the side effects of itemset hiding, we introduced the sibling itemset concept that is used for generating constraints. Additionally, our approach does not require frequent itemset mining executed before the hiding process. This gives our approach an advantage in total running time. We give an evaluation of our algorithm on some benchmark datasets. Our results show the effectiveness of our hiding approach and elimination of prior mining of itemsets is time efficient.