Researchers have published two new papers advancing retrieval-augmented generation (RAG) techniques for different applications, according to arxiv.org.
In a 31-page paper, researchers propose a lightweight framework combining retrieval-augmented generation with large language models for scalable patient-trial matching. According to arxiv.org, the framework addresses challenges in “reasoning over long, heterogeneous electronic health records (EHRs) and complex eligibility criteria.” The approach separates retrieval-augmented generation—used to identify clinically relevant segments from long EHRs—from LLM-based encoding of selected segments into representations.
According to the researchers, the framework was evaluated on multiple public benchmarks (n2c2, SIGIR, TREC 2021/2022) and a real-world multimodal dataset from Mayo Clinic (MCPMD). The results showed that “retrieval-based information selection significantly reduces computational burden while preserving clinically meaningful signals,” according to arxiv.org. The study found that frozen LLMs provide strong representations for structured clinical data, while fine-tuning is essential for unstructured clinical narratives.
Separately, researchers introduced Cooperative Retrieval-Augmented Generation (CoRAG), which treats the reranker and generator as “peer decision-makers rather than being connected through an asymmetric dependency pipeline,” according to arxiv.org. The framework demonstrated “good generalization and improved generation stability” when trained on only around 10,000 PopQA samples, with the model released on GitHub.