Researchers Advance Efficient Training Methods for Multilingual Speech and Language Models

Several recent research papers address efficiency and alignment challenges in training language models across different modalities and languages.

According to arxiv.org, researchers introduced Cross-lingual Speech Language Model (CSLM), an efficient training method for cross-lingual speech language models based on discrete speech tokens. The paper, accepted to Findings of ACL 2026, proposes a novel alignment strategy that achieves cross-modal and cross-lingual alignment through continual pre-training. The researchers state that CSLM “aligns different modalities and languages simultaneously without the need for massive speech data, thus exhibiting good language scalability.”

In a separate study focused on programming education, researchers developed a method for training artificial programming learners using authentic student process data, according to arxiv.org. The approach, accepted to Educational Data Mining 2026, serializes temporal log traces into a conversational format and uses a training pipeline combining supervised fine-tuning with preference optimization. The study trained Qwen models at 4B and 8B scales on real student Python programming submissions.

Additionally, according to arxiv.org, researchers used six multilingual large language models to create targeted “computational lesions” by zeroing small parameter sets. The study compared intact and lesioned models in predicting fMRI responses during 100 minutes of naturalistic story listening in English, Chinese, and French across 112 participants, finding that lesioning a shared core reduced whole-brain encoding correlation by 60.32% relative to intact models.