Feature Selection for Malware Detection on the Android Platform Based on Differences of IDF Values

Peynirci G., Eminağaoğlu M., Karabulut K.

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, vol.35, no.4, pp.946-962, 2020 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 35 Issue: 4
  • Publication Date: 2020
  • Doi Number: 10.1007/s11390-020-9323-x
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Page Numbers: pp.946-962
  • Keywords: malware detection, Android, feature selection, inverse document frequency, static analysis, CODE
  • Dokuz Eylül University Affiliated: Yes


Android is the mobile operating system most frequently targeted by malware in the smartphone ecosystem, with a market share significantly higher than its competitors and a much larger total number of applications. Detection of malware before being published on official or unofficial application markets is critically important due to the typical end users' widespread security inadequacy. In this paper, a novel feature selection method is proposed along with an Android malware detection approach. The feature selection method proposed in this study makes use of permissions, API calls, and strings as features, which are statically extractable from the Android executables (APK files) and it can be used in a machine learning process with different algorithms to detect malware on the Android platform. A novel document frequencybased approach, namely Delta IDF, was designed and implemented for feature selection. Delta IDF was tested upon three universal benchmark datasets that contain Android malware samples and highly promising results were obtained by using several binary classification algorithms.