Three New arXiv Papers Address LLM Alignment and Multilingual Consistency Challenges
Three new preprints on arXiv explore critical challenges in large language model (LLM) development, focusing on alignment and multilingual performance.
Personalized Alignment via VISA
According to arXiv:2603.04822v1, researchers propose VISA (Value Injection via Shielded Adaptation), a method for aligning LLMs with nuanced human values. The paper notes that existing methods like Reinforcement Learning from Human Feedback (RLHF) “often handle only coarse-grained attributes,” while the proposed approach aims to enable more fine-grained personalized alignment through fine-tuning.
Safety Reversals Across Languages
A separate study (arXiv:2603.04904v1) examines safety interventions in multi-agent LLM systems across 16 languages. The research conducted “1,584 multi-agent simulations” and drew parallels to perpetrator treatment observations, where “offenders articulate remorse yet behavioral change does not follow.” The paper reports language-dependent reversals of safety interventions, suggesting alignment techniques may not transfer consistently across languages.
Multilingual Knowledge Consistency
According to arXiv:2603.04678v1, researchers address inconsistent knowledge in multilingual LLMs. The paper states that models “are likely to be asked similar questions in different languages, and inconsistent responses can undermin[e]” their reliability, proposing optimization methods for crosslingual consistency.
All three papers remain preprints and have not yet undergone peer review.