Disease Prediction From Human Microbiome by Utilizing Machine Learning


Temel B. E., Kocapınar B., Işık Z.

9th International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 26 - 28 Ekim 2024, ss.1-5

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1-5
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

With the amount of sequenced microbiome data increasing daily, the field of gut microbiome diagnosis is receiving a growing amount of attention. Numerous statistical and machine learning methods have been employed to enhance comprehension of pathogenic microbes. This work used an abundance data on the human gut microbiome and developed machine-learning models for the accurate diagnosis of six diseases: obesity, ulcerative colitis, autoimmune diseases, clostridium infections, irritable bowel syndrome, and colorectal cancer. In the data processing stage, the most informative bacterial composition for each disease was determined using feature selection techniques (RFECV and XGBoost). To develop a disease-specific diagnosis model, three machine learning models—Random Forest, XGBoost, and k- Nearest Neighbor—were trained in the next stage. The best- performing model was designated for each disease after apply- ing a 10-fold cross-validation. Random Forest performed most effectively for colorectal cancer, while XGBoost dominated the remaining diseases. The average AUC values of the diagnostic models range from 0.95 to 0.99, which is comparable to or superior to previous studies for certain diseases. The literature was thoroughly investigated to determine the diagnostic capa- bility of the bacteria that have the greatest influence on disease classification to demonstrate the accuracy of predictions by these models.