Characterization of Cancer Types by Applying Machine Learning Methods on Blood RNA-Sequencing Data

3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2019, Ankara, Türkiye, 11 - 13 Ekim 2019, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/ismsit.2019.8932905
Basıldığı Şehir: Ankara
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: Cancer detection, Naïve Bayes, Random Forest, RNA-sequencing Data, Support Vector Machine
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

© 2019 IEEE.RNA-sequencing data is used to measure mRNA levels of genes based on tissue or blood samples. The critical changes in transcriptome can be observed more accurately by using RNA-sequencing data that eventually leads to understanding different behavior of the disease. In this study, different feature selection methods and machine learning algorithms are compared for the accurate classification of cancer types by using RNA-sequencing data from blood samples. In the analysis, seven cancer types were compared with each other and healthy samples. Correlation coefficient and information gain analysis are applied as feature selection methods. The selected genes are provided as the input of Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) methods. All machine learning methods were evaluated by applying 10-fold cross-validation. In the experiments, machine learning models achieved higher than 85% accuracy in the discrimination of hepatobiliary, lung, and pancreatic cancer types. When machine learning models are evaluated in terms of accuracy, RF and SVM were more successful than NB in many cases.