Three New arXiv Papers Address LLM Reasoning, Chemistry Problem-Solving, and Safety Concerns

Researchers publish studies on multimodal chemistry evaluation, attribution-based reasoning explanation, and toxicity mitigation in large language models.

Three New arXiv Papers Address LLM Reasoning, Chemistry Problem-Solving, and Safety Concerns

Three recent papers published on arXiv explore different aspects of large language model capabilities and limitations.

Chemistry Olympiad Evaluation

According to arXiv paper 2512.14989v1, researchers are evaluating large language models on multimodal Chemistry Olympiad exams. The paper notes that “multimodal scientific reasoning remains a significant challenge for large language models (LLMs), particularly in chemistry, where problem-solving relies on symbolic diagrams, molecular structures, and structured visual data.”

Explaining LLM Reasoning

A second paper (arXiv:2512.15663v1) introduces attribution graphs as a method for explaining how LLMs reason. The authors state that “large language models (LLMs) exhibit remarkable capabilities, yet their reasoning remains opaque, raising safety and trust concerns.” The research proposes using attribution methods that “assign credit to input features” to explain LLM decision-making.

Safety in Multimodal Models

A third paper (arXiv:2512.15052v1) addresses safety concerns in multimodal large language models through what the authors call “neuron-level detoxification.” According to the abstract, multimodal LLMs “inherit toxic, biased, and NSFW signals from weakly curated pretraining corpora, causing safety risks.” The paper includes a disclaimer that “samples in this paper may be harmful and cause discomfort.”