Assessing the Success of ChatGPT-4o in Oral Radiology Education and Practice: A Pioneering Research Oral Radyoloji Eğitimi ve Uygulamasında ChatGPT-4o'nun Başarısının Değerlendirilmesi: Öncü Bir Araştırma

Akkoca, FATMA; Özdede, MELİH; İlhan, GÜNNUR; Koyuncu, EMRE; Ellidokuz, HÜLYA

doi:10.7126/cumudj.1623854

Assessing the Success of ChatGPT-4o in Oral Radiology Education and Practice: A Pioneering Research Oral Radyoloji Eğitimi ve Uygulamasında ChatGPT-4o'nun Başarısının Değerlendirilmesi: Öncü Bir Araştırma

Akkoca F., Özdede M., İlhan G., Koyuncu E., Ellidokuz H.

Cumhuriyet Dental Journal, cilt.28, sa.2, ss.210-215, 2025 (Scopus, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 28 Sayı: 2
Basım Tarihi: 2025
Doi Numarası: 10.7126/cumudj.1623854
Dergi Adı: Cumhuriyet Dental Journal
Derginin Tarandığı İndeksler: Scopus, Directory of Open Access Journals, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.210-215
Anahtar Kelimeler: Artificial Intelligence, ChatGPT, ChatGPT, Dental Education, Diş Hekimliği Eğitimi, Oral Radiology, Oral Radyoloji, Yapay Zekâ.
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Objectives: This study aims to assess the comprehension and interpretation performance of Chat Generative Pre-Train Omni (GPT-4o) in the context of oral radiology education and practice. Materials and Methods: Utilizing a set of 99 questions derived from the book "White and Pharoah's Oral Radiology: Principles and Interpretation 8th Edition," this study employed ChatGPT-4o to respond to these questions thrice daily at varying times over 10 days, generating a total of 60 responses for each question. Two oral radiologists independently answered the same questions and verified their answers with the relevant textbook. Responses were compared to those of ChatGPT-4o. Results: The study revealed that ChatGPT-4o's correct answer rate was 59.4%. Time-based analysis revealed performance differences across specific day periods. Specifically, during noon and evening sessions, the success rate on the first and seventh days was statistically significantly higher (p = 0.003 and p = 0.002, respectively), while morning performance on those days was significantly lower (p < 0.05), indicating that the time and day of the query may influence response accuracy. In contrast, no significant relationship was found between the difficulty level of the questions and the model's accuracy (p > 0.05). Conclusions: Presently, ChatGPT exhibits inadequacies in its application to oral radiology training and clinical practice. Despite this, expectations for platform improvement and expansion in utility persist, particularly with increased data input and advancements in artificial intelligence.