Research Finds LLM 'Thinking Mode' Has Minimal Impact on Moral Judgments Despite Reducing Cross-Model Disagreement

According to a new paper published on arXiv (arxiv:2605.04488), enabling reasoning or “thinking mode” in large language models has minimal impact on their moral judgments, despite reducing disagreement between different models on contested scenarios.

The study evaluated five frontier reasoning-trained LLMs—Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and Qwen3.5 397B—across 100 moral-judgment scenarios. According to the research, aggregate binary-verdict agreement remained “high and statistically indistinguishable” between instant and thinking modes, with Krippendorff’s alpha scores of 0.78 versus 0.79.

However, the paper notes that disagreement concentrated in 21 “model-disputed scenarios” where instant-mode agreement approached chance levels (alpha = 0.08). In these contested cases, according to arxiv.org, reasoning “directionally narrows cross-model disagreement,” increasing mean pairwise agreement from 5.4 to 6.7 out of 10.

The study also found that reasoning mode reduced demographic-judgment inconsistency in three of five models tested and did not increase it for any model. Notably, according to the research, “reasoning changes self-labeled ethical frameworks more often than binary verdicts” across all five model families.

The paper was submitted to arXiv on May 6, 2026, by Sai Sourabh Madur and colleagues.