AI Agent Memory Poisoning

Most AI security research focuses on what goes into a prompt. Memory poisoning attacks what stays inside the agent. A poisoned memory store turns a helpful assistant into a persistent adversary — and the agent never receives a single malicious instruction.

Prompt injection dominates AI security conversations because it is visible: a user sends a crafted input, the model responds, and the attack is traceable. Memory poisoning operates below that threshold. The contaminant enters through a channel the agent was designed to trust — an email it reads, a web page it summarizes, a file it processes — and embeds itself in the memory store. From that point, every subsequent action reflects the poison.

Memory poisoning does not require the attacker to interact with the agent directly. The attack surface is every data source the agent reads automatically.

A CISA joint advisory with international partners now lists memory contamination as a core risk for agentic AI adoption, and the OWASP Top 10 for LLM Applications classifies it under ASI06: a category specifically for attacks that corrupt persistent agent state. The threat has moved from theoretical to operational.

The HEARTBEAT Attack: Persistent Compromise Without a Prompt

The most striking evidence comes from the HEARTBEAT vulnerability, documented by Yechao Zhang, Shiqian Zhao, Jie Zhang, and Gelei Deng in March 2026 (arXiv:2603.23064). The attack targets the periodic execution mechanism — the heartbeat — that agents like OpenClaw use to maintain background tasks when no user is interacting.

In a standard agent deployment, the heartbeat loop reads incoming data: emails, social feeds, calendar entries, and file updates. An attacker who controls any of those inputs can inject content that the agent stores as memory. No direct prompt. No jailbreaking. The agent reads the data as part of its normal operation and writes the payload into long-term storage.

The HEARTBEAT attack achieves near-total long-term persistence, with cross-session carryover in three out of four test cases — all without sending the agent a single instruction.

OpenClaw patched the specific heartbeat vulnerability in February 2026, but the underlying pattern — unvalidated data ingestion through background processing — remains present in LangChain, CrewAI, AutoGPT, and similar frameworks. The fix addressed one implementation; the attack class endures.

Semantic Compliance Hijacking: Zero-Detection Supply Chain Attacks

While HEARTBEAT exploits background data ingestion, Semantic Compliance Hijacking (SCH) exploits the trust boundary between agent skill marketplaces and runtime execution. Researchers Xinyu Liu, Yukai Zhao, Xing Hu, and Xin Xia published the attack in May 2026 (arXiv:2605.14460), demonstrating how a skill published in a marketplace can embed semantic directives that bypass all existing safety filters.

The result is a supply chain attack with no detectable payload:

Metric	SCH Attack Result
Confidentiality breach	Nearly four in five test cases
Remote code execution	Two in three test cases
Detection rate by current tools	None

The attack is payload-less: the skill contains no malicious code, no suspicious strings, no anomalous API calls. Instead, it uses natural-language compliance rules embedded in the skill description that reshape how the agent interprets its own instructions. The agent has already loaded the skill. The compliance rules govern what the agent considers "correct" behavior. The attack redefines correctness itself.

This connects directly to the supply chain vulnerabilities covered in earlier analysis of MCP server sprawl. The expansion of agent ecosystems — skill marketplaces, tool registries, plugin architectures — creates exactly the trust boundaries that SCH exploits. Every new integration point is a potential vector for semantic manipulation that existing scanners cannot detect because there is no code-level payload to find.

How Memory Poisoning Propagates

Memory poisoning exploits a design assumption baked into every agent framework: that stored context is reliable. Agents persist conversation history, retrieved documents, and intermediate reasoning in memory stores that function as an extended context window. The assumption is that this stored context was validated at the point of entry. In practice, the validation boundary is inconsistent or absent.

Four propagation pathways account for nearly all documented attacks:

Pathway	Mechanism	Detection Difficulty
Background ingestion	Agent reads email, feeds, calendar entries automatically; stores content verbatim	High — occurs without user interaction
RAG index poisoning	Adversarial documents inserted into vector store; retrieved as "relevant" context	High — content appears legitimate at retrieval time
Semantic skill loading	SCH-style compliance rules loaded from marketplace; reshape agent behavior	Very high — no payload to detect
Multimodal sleeper triggers	Adversarial images activate embedded directives when processed by multimodal agents	Very high — trigger is visual, not textual

The RAG index poisoning pathway demands particular attention. Research by Binyan Xu, Xilin Dai, and Kehuan Zhang (arXiv:2604.27770) demonstrates that vector stores implementing retrieval-augmented generation perform lookup, not true memory. Injected content that matches the retrieval embedding propagates indefinitely because the agent treats every retrieved snippet with equal confidence. There is no provenance tagging, no freshness scoring, and no mechanism to distinguish a legitimate document from an adversarial one.

Multimodal Sleeper Agents

The Visual Inception attack (Jiachen Qian, arXiv:2604.16966, April 2026) demonstrates that memory poisoning is not limited to text. An adversary can embed a directive inside an image file that activates only when a multimodal agent processes it. The image appears perfectly normal to human review. When the agent's vision module decodes it, the embedded trigger causes the agent to execute a hidden instruction.

Visual Inception achieves a goal-hit rate of four in five tested multimodal agents — a sleeper trigger invisible to any text-based security scanner.

The attack exploits the gap between what a human reviewer sees in an image and what the model's vision encoder extracts. A security team can screen every text input and still miss an attack that arrives as a PNG. As agents gain multimodal processing capabilities — reading documents, analyzing screenshots, interpreting charts — the attack surface expands beyond the reach of text-only defenses.

Defensive Approaches: What Works Now

The defensive landscape for memory poisoning is early but not empty. Four approaches show measurable results against documented attack patterns:

Defense	Mechanism	Effectiveness
ClawGuard	Deterministic tool-call boundary enforcement; user-confirmed rules before execution	Blocks most RCE pathways from SCH
ClawdGo	Endogenous security awareness training; agent learns to reject anomalous instructions	Trust-and-legitimacy detection score from roughly four in five to near-perfect
SuperLocalMemory	Bayesian trust scoring per memory source; degrades trust in unverified inputs	Strong trust degradation for sleeper content across sources
CognitiveGuard	Dual-process defense: diffusion sanitizer removes visual triggers; counterfactual verification checks consistency	Reduces visual sleeper goal-hit rate from four in five to roughly one in ten

Each defense addresses a specific pathway. ClawGuard prevents the SCH supply chain attack by enforcing human confirmation at tool-call boundaries. SuperLocalMemory adds provenance tracking that the RAG index poisoning pathway lacks. CognitiveGuard specifically targets multimodal sleeper triggers. No single defense covers all four pathways.

The OWASP ASI06 classification gives security teams a reference point for procurement and architecture reviews, but the gap between classification and implementation remains wide. The CISA joint-seal guidance on agentic AI adoption recommends treating all external data inputs as untrusted until validated — a principle that contradicts how most agent frameworks currently operate.

Exceptions and Limits

Memory poisoning is not a uniform threat. Agents that operate in isolated environments with curated, static knowledge bases face considerably less risk than agents that ingest data from open internet sources. A coding assistant reading only a verified repository has a narrower attack surface than a personal assistant reading email, calendar invitations, and social media feeds.

The HEARTBEAT attack depends on an active background execution loop. Agents that operate purely in request-response mode — processing one prompt at a time with no persistent background tasks — are not vulnerable to this specific vector, though they remain susceptible to RAG index poisoning and SCH through their tool integrations.

Current defenses carry overhead. Bayesian trust scoring adds latency to every memory retrieval. Deterministic boundary enforcement requires human confirmation that slows automated workflows. The security-performance tradeoff is real, and organizations running agents at scale will need to decide which pathways to harden first based on their data ingestion profile.

Honest Assessment

Factor	Status
Attack maturity	Documented in research; no confirmed enterprise incidents in production yet
Defense maturity	Four research-stage defenses with measurable results; no production-hardened tooling
Framework exposure	OpenClaw patched heartbeat; LangChain, CrewAI, AutoGPT still lack built-in provenance
Detection capability	Zero detection by current scanning tools
Regulatory awareness	CISA advisory and OWASP ASI06 acknowledge the class; no enforceable standards yet

The gap between attack capability and defensive readiness is the core finding. HEARTBEAT achieves near-total persistence across sessions. SCH breaches confidentiality in most test cases with zero detection. Visual Inception forces goal execution in four out of five multimodal agents from a single image. The best defenses reduce these numbers significantly but require architectural changes that current frameworks do not implement by default.

The question facing security teams deploying agents is not whether memory poisoning is possible. The research is clear on that. The question is where to draw the trust boundary — which data sources to validate, which memory stores to sandbox, and which tool integrations to gate with human confirmation. As covered in the red teaming methodology analysis, the target has shifted from what the model says to what the agent does. Memory poisoning confirms that the attack surface now includes what the agent remembers.

Actionable Takeaways

Audit agent data ingestion points. Map every source an agent reads automatically — email, feeds, files, APIs — and classify each as trusted or untrusted. OpenClaw's heartbeat vulnerability shows that background ingestion without validation is the primary persistence pathway.
Implement provenance tagging on memory stores. Every entry in long-term memory should carry a source identifier and timestamp. SuperLocalMemory's Bayesian trust scoring demonstrates that provenance metadata enables detection that content analysis alone cannot achieve.
Enforce human confirmation at tool-call boundaries. ClawGuard's deterministic boundary enforcement prevents the SCH supply chain attack. Any agent that can execute tool calls without human review is a SCH target.
Screen multimodal inputs. Visual Inception proves that text-only security review is insufficient. Deploy diffusion-based sanitization for images processed by multimodal agents, or restrict image processing to trusted sources only.
Segment memory by trust level. Memory derived from verified internal sources should not mix with memory derived from external or user-provided input in the same context window. Contamination propagates when low-trust and high-trust memories share retrieval priority.

AI Agent Memory Poisoning

The HEARTBEAT Attack: Persistent Compromise Without a Prompt

Semantic Compliance Hijacking: Zero-Detection Supply Chain Attacks

How Memory Poisoning Propagates

Multimodal Sleeper Agents

Defensive Approaches: What Works Now

Exceptions and Limits

Honest Assessment

Actionable Takeaways

Topics

More

Follow

The HEARTBEAT Attack: Persistent Compromise Without a Prompt

Semantic Compliance Hijacking: Zero-Detection Supply Chain Attacks

How Memory Poisoning Propagates

Multimodal Sleeper Agents

Defensive Approaches: What Works Now

Exceptions and Limits

Honest Assessment

Actionable Takeaways

Related Posts

AI Agent Red Teaming: The Adversarial Methodology

MCP Server Sprawl and the Attack Surface

AI Agent Memory Architecture

Topics

More

Follow