Automated linguistic analysis in youth at clinical high risk for psychosis

Kizilay, Elif; Arslan, Berat; Verim, Burcu; Demirlek, Cemal; Demir, Muhammed; Cesim, Ezgi; Eyuboglu, Merve; Uzman Ozbek, Simge; Sut, Ekin; Yalincetin, Berna; BORA, İBRAHİM

doi:10.1016/j.schres.2024.09.009

Automated linguistic analysis in youth at clinical high risk for psychosis

Kizilay E., Arslan B., Verim B., Demirlek C., Demir M., Cesim E., ...Daha Fazla

Schizophrenia Research, cilt.274, ss.121-128, 2024 (SCI-Expanded, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 274
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.schres.2024.09.009
Dergi Adı: Schizophrenia Research
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, Academic Search Premier, PASCAL, BIOSIS, CINAHL, Educational research abstracts (ERA), MEDLINE, Psycinfo
Sayfa Sayıları: ss.121-128
Anahtar Kelimeler: At-risk, Psychosis, Natural language processing, Semantic similarity, Machine learning
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Identifying individuals at clinical high risk for psychosis (CHR[sbnd]P) is crucial for preventing psychosis and improving the prognosis for schizophrenia. Individuals at CHR-P may exhibit mild forms of formal thought disorder (FTD), making it possible to identify them using natural language processing (NLP) methods. In this study, speech samples of 62 CHR-P individuals and 45 healthy controls (HCs) were elicited using Thematic Apperception Test images. The evaluation involved various NLP measures such as semantic similarity, generic, and part-of-speech (POS) features. The CHR-P group demonstrated higher sentence-level semantic similarity and reduced mean image-to-text similarity. Regarding generic analysis, they demonstrated reduced verbosity and produced shorter sentences with shorter words. The POS analysis revealed a decrease in the utilization of adverbs, conjunctions, and first-person singular pronouns, alongside an increase in the utilization of adjectives in the CHR-P group compared to HC. In addition, we developed a machine-learning model based on 30 NLP-derived features to distinguish between the CHR-P and HC groups. The model demonstrated an accuracy of 79.6 % and an AUC-ROC of 0.86. Overall, these findings suggest that automated language analysis of speech could provide valuable information for characterizing FTD during the clinical high-risk phase and has the potential to be applied objectively for early intervention for psychosis.