Evaluation of the Diagnostic Accuracy of ChatGPT 4o According to the 2017 Classification of Periodontal Diseases

Muhterem A., Dikilitaş A., Akkoca F., Arayıcı M. E., Akcalı A.

EuroPerio11 Vienna 2025, Vienna, Avusturya, 14 - 17 Mayıs 2025, cilt.1, ss.1

Yayın Türü: Bildiri / Özet Bildiri
Cilt numarası: 1
Basıldığı Şehir: Vienna
Basıldığı Ülke: Avusturya
Sayfa Sayıları: ss.1
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Background & Aim: To evaluate the ability of ChatGPT 4o to stage and grade periodontitis according to the 2017 classification of periodontal diseases. Methods: From the university hospital database, sagittal sections of the tooth with the greatest bone loss obtained from cone-beam computed tomography (CBCT) images were analyzed by ChatGPT 4o. Reference points on the cementoenamel junction, tooth apex, and bone levels were introduced to the AI. Parameters considered for stage and grade assessment, including several teeth lost due to periodontal disease, furcation involvement, vertical and horizontal bone destruction patterns, and masticatory dysfunction, were uploaded to the ChatGPT 4o. Measurement concordance was analyzed using intraclass correlation coefficient (ICC) and kappa tests.
Results: 37 participants with a mean age of 54.2 ± 9.6 years were included in the study. Researchers calculated bone loss as 48.8% ± 17.9, while bone loss percentage was recorded as 50.8% ± 5.3 by ChatGPT. Regarding bone loss-to-age ratios, researchers found 0.92 ± 0.38, and ChatGPT was calculated as 0.96 ± 0.22. Inter-rater reliability analysis showed a non-significant correlation (r = 0.173, p = 0.306) and low level of agreement (ICC = 0.175, 95% CI: - 0.619 – 0.577, p = 0.286) between the researchers and ChatGPT regarding bone loss percentages. However, a significant moderate correlation (r = 0.477, p = 0.003) and agreement (ICC = 0.586, 95% CI: 0.194 – 0.787, p = 0.005) were noted for the bone loss-to-age ratio. A fair level of agreement was observed between the researchers and ChatGPT 4o regarding grading (κ = 0.385, p = 0.018). A moderate level of agreement was observed between researchers and ChatGPT regarding staging (κ = 0.579, p < 0.001).
Conclusions: ChatGPT 4o demonstrated moderate success in radiographic analyses and may serve as guidance in radiographic evaluations required to establish initial staging and grading of periodontitis patients.