Turk Osteoporoz Dergisi, cilt.31, sa.1, ss.12-18, 2025 (Scopus)
Objective: The aim of this study was to comprehensively evaluate the quality and readability of the content of artificial intelligence (AI)generated texts about spondyloarthropathy (SpA). Materials and Methods: The most frequently searched keywords related to the SpA-group were identified through Google Trends. The keywords were sequentially entered into AI chatbots (ChatGPT, Bard, Copilot). The Ensuring Quality Information for Patients (EQIP) tool was used to assess the clarity of information and quality of writing. Flesch-Kincaid readability tests (reading-ease and grade-level) and Gunning Fog index (GFI) were used to assess the readability of the texts. Results: The mean EQIP score of the texts was 66.44. The mean Flesch-Kincaid reading ease score was 38.06. The mean score for Flesch-Kincaid grade level is 11.38. The mean GFI score is 13.91. Our study concludes that the AI chatbots’ responses on SpA are generally of “good quality with minor problems”. It was determined that the texts produced were complex enough to require approximately 11 years of training. When the quality and readability characteristics of the texts generated by the AI chatbots were compared, the EQIP scores of the texts generated by Copilot were higher than those generated by both ChatGPT and Bard (p<0.001, p=0.004, respectively). Furthermore, ChatGPT-generated texts were found to require a higher level of education than those generated by both Copilot and Bard (p=0.002, p=0.004, respectively). Conclusion: This study reveals that AI chatbots’ texts about SpA have certain shortcomings in terms of quality and readability. As a result, it emphasizes that online resources and AI tools play an important role in information delivery in the healthcare field, but quality and readability control should be ensured. This can facilitate patients’ access to accurate, reliable, and comprehensible information.