Three New Studies Examine LLM Safety, Memory, and Healthcare Applications

Recent arXiv papers explore conversational AI for medical diagnosis, memory-based security risks in AI agents, and policy compliance in custom chatbots.

Three New Studies Examine LLM Safety, Memory, and Healthcare Applications

Three recent papers published on arXiv address critical challenges in large language model deployment across different domains.

Healthcare Diagnostics: According to arXiv paper 2512.17559v1, researchers are developing explainable conversational AI systems aimed at early medical diagnosis. The paper notes that “healthcare systems around the world are grappling with issues like inefficient diagnostics, rising costs, and limited access to specialists,” which often result in treatment delays and poor health outcomes.

Security Vulnerabilities: A cross-listed paper (arXiv:2512.16962v1) introduces “MemoryGraft,” describing a security concern where LLM agents that use long-term memory and Retrieval-Augmented Generation (RAG) could be compromised. According to the abstract, “while this experience learning capability enhances agentic autonomy, it introduces” new vulnerability vectors through poisoned experience retrieval.

Policy Compliance: Research paper arXiv:2502.01436v3 examines automated policy compliance evaluation for custom GPTs. The study focuses on “user-configured chatbots built on top of large language models” available through platforms like OpenAI’s GPT Store, where “usage policies intended to prevent harmful or inappropriate” behavior must be enforced.

All three papers remain in preprint status on arXiv and have not yet undergone peer review.