How Can Artificial Intelligence Tell About Intensive Care Unit: Assessment of Readability, Reliability, and Quality of ChatGPT and BARD's Responses

HANCI, VOLKAN; SHERMATOV, NURGAZY; İBİŞOĞLU, EMEL; KARA, FEVZİ; GEYLANİ, BATUHAN; ERDEMİR, İSMAİL; Ergun, Bisar; Hanci, Ferid; Gul, Sanser

doi:10.22034/ircmj.2024.202101

How Can Artificial Intelligence Tell About Intensive Care Unit: Assessment of Readability, Reliability, and Quality of ChatGPT and BARD's Responses

HANCI V., SHERMATOV N., İBİŞOĞLU E., KARA F., GEYLANİ B., ERDEMİR İ., ...Daha Fazla

IRANIAN RED CRESCENT MEDICAL JOURNAL, cilt.26, sa.1, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 26 Sayı: 1
Basım Tarihi: 2024
Doi Numarası: 10.22034/ircmj.2024.202101
Dergi Adı: IRANIAN RED CRESCENT MEDICAL JOURNAL
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), CAB Abstracts, Veterinary Science Database
Anahtar Kelimeler: Artificial Intelligence, Bard, ChatGPT, Intensive Care Unit, Online Medical Information, Readability
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Background and Objectives: Artificial Intelligence (AI) chatbots provide easy access to information. However, some concerns may arise with this technology, such as technological maturity, lack of empathy, accuracy, quality, reliability and readability. In this study, we aimed to assess the quality, readability, and reliability of the answers to questions asked to AI chatbots ChatGPT and Bard about intensive care unit. Methods: In this observational and cross-sectional study, ChatGPT and Bard's answers to the 100 most frequently asked questions about intensive care were analyzed separately for readability, quality, reliability, and adequacy. Results: Bard's responses were more readable than ChatGPT's responses for all scores evaluated (P < 0.001). Both ChatGPT and Bard's responses were significantly different from the sixth grade reading level (P < 0.001). Responses from ChatGPT and Bard were similar JAMA, modified DISCERN, and GQS scores (P = 0.504; P = 0.123 P = 0.086, respectively). Conclusion: The current capabilities of ChatGPT and Bard are inadequate in terms of quality and readability of ICU-related text content. The readability levels of both ChatGPT and Bard's AI were above the stated sixth-grade level and were difficult to read. The readability of both AI chatbots' responses needs to reach appropriate limits.