According to research published on arxiv.org, large language models (LLMs) exhibit human-like motivated reasoning when assigned specific personas, potentially undermining rational decision-making similar to human cognitive biases.
The study, accepted at ACL Findings 2026, tested 8 LLMs across political and socio-demographic personas using two reasoning tasks from human-subject studies. According to the research, persona-assigned LLMs showed up to 9% reduced veracity discernment when evaluating misinformation headlines compared to models without personas. Political personas were up to 90% more likely to correctly evaluate scientific evidence on gun control when the ground truth aligned with their assigned political identity.
The researchers note that “prompt-based debiasing methods are largely ineffective at mitigating these effects,” raising concerns about amplifying identity-congruent reasoning in both LLMs and humans.
Separately, arxiv.org published research on novel security vulnerabilities in Large Reasoning Models (LRMs). The study introduces a “Psychology-based Reasoning-targeted Jailbreak Attack (PRJA) Framework” that achieved an 83.6% average attack success rate against commercial LRMs including DeepSeek R1, Qwen2.5-Max, and OpenAI o4-mini. According to the researchers, this attack injects harmful content into reasoning steps while preserving unchanged final answers, presenting challenges for AI safety in high-stakes domains like healthcare and education.