Three New ArXiv Papers Advance LLM Training and Recommendation Systems

Three research papers published on arXiv on May 18, 2026, present advances in large language model applications and training methodologies.

Cross-Domain Recommendation Enhancement

According to arxiv.org, researchers introduced LLM-EDT (Large Language Model Enhanced Cross-domain Sequential Recommendation with Dual-phase Training), which addresses challenges in Cross-domain Sequential Recommendation (CDSR) systems. The paper identifies two key issues: an “imbalance issue” where interactions in one domain dominate user behavior, and a “transition issue” affecting cross-domain preference capture. The proposed system includes a “transferable item augmenter” to generate cross-domain behaviors and a “domain-aware profiling module” to create comprehensive user profiles. The code has been released online, according to the paper.

Learning from User Interaction Logs

A separate arxiv.org paper presents UNO (User log-driveN Optimization), a framework for improving LLM systems using user interaction logs. According to the researchers, the method addresses challenges in learning from “unstructured and noisy” user logs by distilling them into semi-structured rules and preference pairs. The paper states that UNO “significantly outperforms Retrieval Augmented Generation (RAG) and memory-based baselines,” with code open-sourced on GitHub.

Advanced Reasoning Training

According to a third arxiv.org paper, researchers developed a four-stage post-training workflow for LLM reasoning that achieved 79.3% on MATH and 25.2% on AIME 2024 benchmarks using a Qwen3-1.7B model, compared to 75.9% and 19.8% respectively for direct GRPO training. The workflow includes sparse-reward reinforcement learning, forward-KL warmup, on-policy distillation, and optional additional RL training.