Sensitivity to Emotional Exploitation in Reasoning Models: Stereotypical Analysis

Çeldir O. M., Dalkılıç G.

2025 7th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (ICHORA), Ankara, Türkiye, 23 - 24 Mayıs 2025, ss.1-8, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/ichora65333.2025.11017311
Basıldığı Şehir: Ankara
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.1-8
Anahtar Kelimeler: ArtificiaI Intelligence Bias, Large Language Models, LLM Security, Prompt Injection
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

In this study, a survey was conducted on synthetic participants with different stereotypes based on three different moral dilemma scenarios using GPT-4o vs. o1-mini. The tests conducted show that the reasoning model tends to give utilitarian answers to questions measuring moral dilemmas. In the reasoning model, the fairness answers, which were 78.83% when no intervention was made, decreased to 5.99% in the case of an emotional distortion attack. The standard model only reduced this rate from 99.84% to 95.91%. In addition, when the stereotypical analysis of the answers was examined, it was determined that the human-like reasoning ability of reasoning models had a serious bias, especially in groups separated by gender and economic status. In reasoning models, synthetic participants with female personas gave an average of 52.78% utilitarian answers, while male participants gave 61.96% utilitarian answers.