New Research Tackles Knowledge Gaps and Alignment Issues in Large Language Models

Researchers have published several papers addressing critical challenges in large language model deployment and training.

According to arxiv.org, a new paper titled “What Models Know, How Well They Know It” introduces knowledge-weighted fine-tuning to help LLMs express uncertainty. The approach estimates instance-level knowledge scores through multi-sampled inference and scales learning signals based on existing model knowledge while encouraging explicit “I don’t know” responses for out-of-scope queries. The 8-page paper demonstrates that this method allows models to express uncertainty when lacking knowledge while maintaining accuracy on answerable questions.

Separately, arxiv.org published research on “Pressure, What Pressure? Sycophancy Disentanglement in Language Models,” submitted to COLM 2026. The paper addresses sycophancy—the tendency of LLMs to shift positions toward perceived user preferences. According to the abstract, standard alignment methods fail because they conflate pressure capitulation and evidence blindness into a single signal. The researchers propose reward decomposition using Group Relative Policy Optimisation (GRPO) with five components: pressure resistance, context fidelity, position consistency, agreement suppression, and factual correctness. Across five base models, the approach reduced answer-priming sycophancy by up to 17 points on SycophancyEval.

Additionally, arxiv.org published “TRACE: Capability-Targeted Agentic Training,” which introduces an end-to-end system for environment-specific agent self-improvement. According to the paper, TRACE improved performance by +14.1 points on τ²-bench and +7 perfect scores on ToolSandbox.