Study Finds LLM 'Thinking Mode' Changes Ethical Frameworks More Than Judgments

According to a new paper on arxiv.org, enabling “thinking mode” in large language models changes how they justify moral decisions more frequently than it changes their actual judgments. The research evaluated five frontier reasoning-trained models—Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and Qwen3.5 397B—across 100 moral-judgment scenarios.

The study found that aggregate binary-verdict agreement remained “high and statistically indistinguishable” between instant and thinking modes, with Krippendorff’s alpha scores of 0.78 versus 0.79, according to the paper. However, disagreement concentrated in 21 “model-disputed scenarios” where instant-mode agreement was near chance (alpha = 0.08).

According to arxiv.org, reasoning mode directionally narrowed cross-model disagreement in these contested cases, increasing mean pairwise agreement from 5.4 to 6.7 out of 10. The paper also reported that reasoning reduced demographic-judgment inconsistency in three of five models without increasing it for any model.

Most notably, the research found that “across all five model families, reasoning changes self-labeled ethical frameworks more often than binary verdicts,” suggesting that thinking mode primarily affects how models explain their decisions rather than the decisions themselves. The paper was published on May 7, 2026, according to arxiv.org.