Researchers have published several approaches to improve how large language models (LLMs) handle multiple tasks and maintain consistency in complex reasoning scenarios.
According to arxiv.org, a new method called PEML (Parameter-Efficient Multi-task Learning) combines continuous prompt optimization with low-rank model adaptation to enable efficient fine-tuning across multiple tasks. The research showed “an average accuracy improvement of up to 6.67%, with individual tasks showing peak gains of up to 10.75%” when evaluated against existing methods including MTL-LoRA, MultiLoRa, C-Poly, and MoE on benchmarks including GLUE, SuperGLUE, and commonsense reasoning tasks.
In related work on prompting strategies, arxiv.org reported a reinforcement learning framework that trains lightweight “prompter” models to optimize prompts for frozen LLMs. The approach demonstrated substantial performance gains, “improving performance from 55% to 90% in logic-intensive reasoning and 74% to 91% in tool-use tasks” on the Big Bench Extra Hard and Tau-bench suites.
For multi-turn dialogue systems, arxiv.org introduced Self-Recall Thinking (SRT), which addresses consistency challenges by enabling models to “selectively recall and reason over context during inference.” According to the paper, SRT “improves F1 score by 4.7% and reduces end-to-end latency by 14.7%” compared to prior methods.
Additionally, arxiv.org presented E-mem, a multi-agent framework accepted to ICML 2026 that achieved “over 54% F1, surpassing the state-of-the-art GAM by 7.75%, while reducing token cost by over 70%” on the LoCoMo benchmark.