Three New ArXiv Papers Advance LLM Training and Recommendation Systems

Researchers propose methods to enhance cross-domain recommendations, optimize LLM systems with user logs, and improve reasoning through multi-stage training.

Three research papers published on arXiv on May 18, 2026, present advances in large language model applications and training methodologies.

Cross-Domain Recommendation Enhancement

According to arxiv.org, researchers introduced LLM-EDT (Large Language Model Enhanced Cross-domain Sequential Recommendation with Dual-phase Training), which addresses challenges in Cross-domain Sequential Recommendation (CDSR) systems. The paper identifies two key issues: an “imbalance issue” where interactions in one domain dominate user behavior, and a “transition issue” affecting cross-domain preference capture. The proposed system includes a “transferable item augmenter” to generate cross-domain behaviors and a “domain-aware profiling module” to create comprehensive user profiles. The code has been released online, according to the paper.

Learning from User Interaction Logs

A separate arxiv.org paper presents UNO (User log-driveN Optimization), a framework for improving LLM systems using user interaction logs. According to the researchers, the method addresses challenges in learning from “unstructured and noisy” user logs by distilling them into semi-structured rules and preference pairs. The paper states that UNO “significantly outperforms Retrieval Augmented Generation (RAG) and memory-based baselines,” with code open-sourced on GitHub.

Advanced Reasoning Training

According to a third arxiv.org paper, researchers developed a four-stage post-training workflow for LLM reasoning that achieved 79.3% on MATH and 25.2% on AIME 2024 benchmarks using a Qwen3-1.7B model, compared to 75.9% and 19.8% respectively for direct GRPO training. The workflow includes sparse-reward reinforcement learning, forward-KL warmup, on-policy distillation, and optional additional RL training.