Three New Research Papers Address LLM Efficiency Through Knowledge Fusion, Energy Analysis, and Compression

Three new papers on arXiv explore different approaches to improving large language model (LLM) efficiency and capabilities.

Knowledge Fusion via Modular SkillPacks

According to arXiv paper 2505.18502v2, researchers are addressing “cross-capability transfer” in LLMs, which has applications in “multi-task integration, model compression, and continual learning.” The paper references previous works FuseLLM and FuseChat as having “demonstrated the potential” in this area, though the abstract excerpt ends before detailing the new approach.

Energy-Based Model Interpretation

ArXiv paper 2602.18671v2 presents a novel interpretation of LLM architectures. According to the paper, researchers “reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM).” This approach “decompos[es] the sequence-to-sequence probability chain into multiple interacting EBMs at inference,” allowing researchers to track what they term “spilled energy” during the inference process.

Compression Through Rank and Sparsity

The third paper (arXiv 2505.03801v2) focuses on LLM compression using “low-rank and sparse composite approximation.” According to the abstract, this approach is described as “a natural idea to compress Large Language Models,” though the researchers acknowledge it “faces two primary challenges that adversely affect the performance of existing methods.”

All three papers represent updated versions (v2) of previously posted research.