Automated linguistic analysis in speech samples of Turkish-speaking patients with schizophrenia-spectrum disorders

Arslan, Berat; Kizilay, Elif; Verim, Burcu; Demirlek, Cemal; Dokuyan, Yagmur; Turan, Yaren; Kucukakdag, Aybuke; Demir, Muhammed; Cesim, Ezgi; BORA, İBRAHİM

doi:10.1016/j.schres.2024.03.014

Automated linguistic analysis in speech samples of Turkish-speaking patients with schizophrenia-spectrum disorders

Arslan B., Kizilay E., Verim B., Demirlek C., Dokuyan Y., Turan Y. E., ...Daha Fazla

Schizophrenia Research, cilt.267, ss.65-71, 2024 (SCI-Expanded, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 267
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.schres.2024.03.014
Dergi Adı: Schizophrenia Research
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, Academic Search Premier, PASCAL, BIOSIS, CINAHL, Educational research abstracts (ERA), MEDLINE, Psycinfo
Sayfa Sayıları: ss.65-71
Anahtar Kelimeler: Schizophrenia spectrum disorders, Natural language processing, Semantic similarity, Diagnostic classification, Biomarker
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Modern natural language processing (NLP) methods provide ways to objectively quantify language disturbances for potential use in diagnostic classification. We performed computerized language analysis in speech samples of 82 Turkish-speaking subjects, including 44 patients with schizophrenia spectrum disorders (SSD) and 38 healthy controls (HC). Exploratory analysis of speech samples involved 16 sentence-level semantic similarity features using SBERT (Sentence Bidirectional Encoder Representation from Text) as well as 8 generic and 8 part-of-speech (POS) features. The random forest classifier using SBERT-derived semantic similarity features achieved a mean accuracy of 85.6 % for the classification of SSD and HC. When semantic similarity features were combined with generic and POS features, the classifier's mean accuracy reached to 86.8 %. Our analysis reflected increased sentence-level semantic similarity scores in SSD. Generic and POS analyses revealed an increase in the use of verbs, proper nouns and pronouns in SSD while our results showed a decrease in the utilization of conjunctions, determiners, and both average and maximum sentence length in SSD compared to HC. Quantitative language features were correlated with the expressive deficit domain of BNSS (Brief Negative Symptom Scale) as well as with the duration of illness. These findings from Turkish-speaking interviews contribute to the growing evidence-based NLP-derived assessments in non-English-speaking patients.