Development and Temporal Validation of Explainable Machine Learning Models for Predicting Vitamin B12 Deficiency Using Routine Laboratory Analytes

Demirci, Ferhat; Yıldırım, Oktay; Demirci, Aylin; Akan, PINAR

doi:10.3390/diagnostics16040563

Development and Temporal Validation of Explainable Machine Learning Models for Predicting Vitamin B12 Deficiency Using Routine Laboratory Analytes

Demirci F., Yıldırım O., Demirci A., Akan P.

DIAGNOSTICS, cilt.16, sa.4, ss.1-20, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 4
Basım Tarihi: 2026
Doi Numarası: 10.3390/diagnostics16040563
Dergi Adı: DIAGNOSTICS
Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), EMBASE, Directory of Open Access Journals
Sayfa Sayıları: ss.1-20
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Background/Objectives: Vitamin B12 deficiency is a prevalent yet frequently underdiagnosed condition, largely due to the limited diagnostic accuracy of serum total B12 and the restricted availability of confirmatory biomarkers such as holotranscobalamin and methylmalonic acid. This study aimed to develop and validate explainable machine learning (ML) models capable of predicting vitamin B12 deficiency using only routinely available laboratory examinations, thereby supporting early detection within standard diagnostic workflows. Methods: This retrospective study included 51,630 adult patients who underwent concurrent vitamin B12 testing and routine laboratory evaluation between 2015 and 2025. An independent temporal validation cohort of 34,744 patients was used to assess generalizability. Eight supervised ML algorithms were developed within a four-stage experimental framework incorporating default modeling, probability-threshold optimization, hyperparameter tuning, and feature engineering. Model performance was evaluated using AUC-ROC, AUC-PR, sensitivity, specificity, F1 score, accuracy, Matthews correlation coefficient, and likelihood ratios. Model explainability and clinical utility were assessed using SHAP, LIME, and decision curve analysis. Results: Among all algorithms, CatBoost demonstrated the most balanced and clinically relevant performance. In the threshold-optimized configuration, the model achieved a sensitivity of 0.92, specificity of 0.67, F1 score of 0.82, AUC-ROC of 0.88, and AUC-PR of 0.86 in the test set. Temporal validation confirmed robust generalizability, with improved discrimination (AUC-ROC 0.90; AUC-PR 0.91) and stable calibration. Explainability analyses identified hematologic indices (MCV, HGB, HCT, RDW), iron-related markers, inflammatory measurands, and age as the most influential contributors, consistent with known pathophysiology. Conclusions: This study presents a large-scale, explainable, and temporally validated ML framework for predicting vitamin B12 deficiency using routine laboratory data alone. The model demonstrates strong diagnostic performance, biological plausibility, and potential for seamless integration into laboratory and clinical decision-support systems, enabling cost-effective and early identification of patients at risk.