Two New Benchmarks Advance Retrieval-Augmented Generation Research

Researchers have introduced two new evaluation frameworks aimed at advancing Retrieval-Augmented Generation (RAG) capabilities in different domains.

RAG-IGBench for Interleaved Generation

According to arXiv paper 2512.05119v1, RAG-IGBench addresses the need for “interleaved image-text generation” in open-domain question answering. The researchers note that “providing user queries with visually enhanced responses can considerably benefit understanding and memory, underscoring the great value of interleaved image-text generation.” The benchmark aims to evaluate RAG systems that can generate responses combining both text and images in an interleaved format.

M4-RAG for Multilingual Multimodal Systems

A separate paper (arXiv 2512.05959v1) introduces M4-RAG, described as “A Massive-Scale Multilingual Multi-Cultural Multimodal RAG.” According to the abstract, while “Vision-language models (VLMs) have achieved strong performance in visual question answering (VQA), yet they remain constrained by static training data.” The M4-RAG framework addresses this limitation by enabling “access to up-to-date” information through retrieval augmentation across multiple languages and cultural contexts.

Both benchmarks aim to push RAG technology beyond text-only applications into more complex multimodal scenarios.