Evaluation of automated machine learning platforms against an expert-designed deep learning model for optic disc abnormality detection

DURMAZ ENGİN, CEREN; Gokkan, Mahmut; ÖZİZMİRLİLER, DENİZCAN; Besenk, Ufuk; SELVER, MUSTAFA; Soylev Bajin, MELTEM; Grzybowski, Andrzej

doi:10.1016/j.ajoint.2026.100249

Evaluation of automated machine learning platforms against an expert-designed deep learning model for optic disc abnormality detection

DURMAZ ENGİN C., Gokkan M. O., ÖZİZMİRLİLER D., Besenk U., SELVER M. A., Soylev Bajin M., ...Daha Fazla

AJO International, cilt.3, sa.2, 2026 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 3 Sayı: 2
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.ajoint.2026.100249
Dergi Adı: AJO International
Derginin Tarandığı İndeksler: Scopus
Anahtar Kelimeler: AutoML, Cross-branch attention, Deep learning, Optic disc abnormalities
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Purpose: To evaluate the classification performance of three no-code AutoML platforms, Google Vertex AI, Amazon SageMaker, and Microsoft Azure, with a custom deep learning (DL) model called FAN-X (Cross Branch Feature Fusion Attention Network) for detecting optic disc abnormalities from fundus photographs. Design: Retrospective, cross-sectional diagnostic accuracy study Subjects: This study analyzed a curated set of fundus photographs with normal optic discs and a spectrum of optic disc abnormalities. Image labels were established by expert ophthalmologists and served as the reference standard. Methods: A balanced dataset of 700 right-eye fundus photographs was categorized into six classes: normal (n = 200), optic atrophy, glaucomatous cupping, papilledema, optic disc drusen (ODD), and tilted disc (n = 100 each). AutoML models were developed using each platform's standard pipeline, while FAN-X model incorporated a cross-branch attention architecture combining ResNet101 and EfficientNet-B0 to enrich feature representations in latent space within a custom ensemble learning framework. Performance was evaluated using precision, recall, average precision (AP), and F1 score. Grad-CAM visualizations were developed to interpret model focus areas. Results: Microsoft Azure achieved the highest precision (95%) and recall (95%) among the evaluated platforms. Amazon SageMaker, Google Vertex AI, and FAN-X model achieved precision values of 91.3%, 88.0%, and 90.3%, respectively. All models performed well in classifying glaucomatous cupping and optic atrophy. The most challenging category was tilted disc, where recall ranged from 60% to 85% across models. Misclassification between papilledema and drusen was common, with Vertex AI mislabeling 16% of papilledema cases as drusen and vice versa. Grad-CAM visualizations showed model attention to clinically relevant regions, including disc margins, central cupping, and peripapillary structures. Conclusion: AutoML platforms demonstrated classification performance comparable to FAN-X DL model in identifying optic disc abnormalities. Nonetheless, anatomically variable and overlapping presentations remain challenging. These findings support the utility of AutoML tools in ophthalmic diagnostics while highlighting areas for further refinement.