Three new research papers address critical challenges in large language model (LLM) deployment, focusing on hallucination detection and social bias.
Hallucination Detection Advances
Researchers have published two papers exploring methods to detect when LLMs generate factually incorrect or unsupported content. According to arXiv paper 2601.06196v2, one study introduces a “manifold-based sampling” approach for detecting hallucinations in context. The research builds on prior work examining decoding strategies, retrieval augmentation, and supervised fine-tuning for hallucination mitigation.
A second paper (arXiv:2601.14310v1) presents CORVUS, which tests the robustness of hallucination detectors through “internal signal camouflage.” According to the abstract, single-pass hallucination detectors typically rely on internal telemetry such as uncertainty, hidden-state geometry, and attention patterns, assuming hallucinations leave detectable traces in these signals. The CORVUS research uses white-box, model-level approaches to evaluate this assumption.
Bias in Gaming AI
Separately, researchers published FAIRGAMER (arXiv:2508.17825v3), a study evaluating social biases in LLM-based video game Non-Player Characters (NPCs). According to the paper, as LLMs increasingly enhance or replace traditional NPCs, they inherit underlying social biases related to factors like race or class, creating fairness risks during gameplay.