PermaFrost-Attack: New Pretraining Poisoning Threat Targets Large Language Models

According to a paper published on arxiv.org, researchers have identified a new attack vector called Stealth Pretraining Seeding (SPS) that exploits how large language models are trained on web-scale data. The attack, dubbed “PermaFrost-Attack,” allows adversaries to plant “logic landmines” in LLMs during the pretraining phase.

The researchers explain that attackers distribute small amounts of poisoned content across stealth websites and expose them to web crawlers through robots.txt, increasing the likelihood that such content enters future training corpora from sources like Common Crawl. According to the paper, “each individual payload is tiny, diffuse, and superficially benign,” making the attack difficult to detect during dataset construction or filtering.

The attack creates what the researchers call “dormant logic landmines embedded during pretraining that remain largely invisible under standard evaluation, yet can later be activated by precise alphanumeric triggers such as <00TRIGGER00> to bypass safeguards.” The name PermaFrost draws an analogy to Arctic permafrost, where harmful material can remain frozen and unnoticed for long periods before resurfacing.

The researchers developed a controlled framework with geometric diagnostics including Thermodynamic Length, Spectral Curvature, and the Infection Traceback Graph. According to the paper, experiments across multiple model families and scales showed that SPS is “broadly effective, inducing persistent unsafe behavior while often evading alignment defenses.”