Compressed Language Model Optimized for European Languages
Researchers have created Bielik-Minitron-7B, a compressed language model specifically optimized for European languages, according to a paper published on arxiv.org. The model reduces the parameter count from 11.04 billion to 7.35 billion parameters—a 33.4% reduction—while recovering approximately 90% of the baseline model’s performance.
According to the research paper, the team employed a two-stage compression methodology inspired by the NVIDIA Minitron approach, combining structured hybrid pruning and knowledge distillation. The researchers utilized the NVIDIA Model Optimizer for structural pruning and the NVIDIA NeMo Framework for logit-based distillation for quality recovery.
Following distillation, the model underwent what the paper describes as “a rigorous alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning (GRPO).” The final model provides up to 50% inference speedup compared to the original, according to the researchers.
The paper states that “this approach demonstrates an efficient pathway to create language models for less-represented languages, preserving the original model quality while reducing inference deployment costs.” The work was submitted to arxiv.org on March 12, 2026, by Krzysztof Wróbel and colleagues.