Researchers Unveil PermaFrost-Attack: New Threat to Large Language Models via Poisoned Pretraining Data

According to a paper published on arXiv, researchers have identified a new attack vector called PermaFrost-Attack that exploits the web-scale pretraining process of large language models. The attack uses a technique called Stealth Pretraining Seeding (SPS), where adversaries distribute small amounts of poisoned content across stealth websites and expose them to web crawlers through robots.txt, increasing the likelihood that such content is absorbed into future training corpora derived from sources like Common Crawl.

According to the research, “each individual payload is tiny, diffuse, and superficially benign,” making the attack difficult to detect during dataset construction or filtering. The paper states that this results in “dormant logic landmines embedded during pretraining that remain largely invisible under standard evaluation, yet can later be activated by precise alphanumeric triggers such as <00TRIGGER00> to bypass safeguards.”

The researchers named the attack PermaFrost, drawing an analogy to Arctic permafrost where “harmful material can remain frozen, buried, and unnoticed for long periods, only to resurface when conditions allow.” According to the paper, experiments across multiple model families and scales showed that SPS “is broadly effective, inducing persistent unsafe behavior while often evading alignment defenses.”

The researchers introduced geometric diagnostic tools including Thermodynamic Length, Spectral Curvature, and the Infection Traceback Graph to examine latent model behavior.