Persistent Memory Is a Persistent Attack Surface: Memory Poisoning and Your Agents

In 2026 the security research community quietly settled an argument that had been simmering for two years. The argument was whether persistent memory in AI agents introduced a meaningfully new attack surface or whether it was a variation on prompt injection that existing defences would handle. The verdict, after a string of public demonstrations, is that it is a meaningfully new attack surface, and the existing defences do not handle it. The shape of the new threat is straightforward to describe and uncomfortable to mitigate, and any team shipping memory-backed agents in 2026 should have an explicit position on it.

The cleanest illustration came from Dark Reading's coverage of the ChatGPT memory feature being weaponised as a prompt-injection delivery vector. The mechanic, simplified: an attacker gets the user's assistant to ingest a piece of content that contains an instruction. The instruction does not need to fire in the current session. It is written into the user's persistent memory, and on every future session - days, weeks, months later - the assistant pulls it back in as “remembered context” and acts on it. The window between “poison the well” and “exploit succeeds” can be arbitrarily long. The user has no obvious signal that something is wrong, because nothing visibly broke; the assistant just slowly started behaving in ways that benefit the attacker. The Ragionex Memory API - POST /v1/memory/write and POST /v1/memory/search - is built around per-user isolation and a no-model-on-the-retrieval-path discipline for exactly this reason.

What changes when memory survives sessions

For stateless LLM calls, the threat model is bounded. A prompt injection that lands in a single conversation expires when the conversation ends. The attacker has one shot, and the blast radius is the current session. Defences live at the prompt-construction layer: input sanitisation, instruction hierarchies, structured outputs that prevent the model from veering off the task.

For memory-backed agents, the threat model loses its time bound. An injection that gets indexed into persistent memory does not expire when the session ends. Every future query that retrieves the poisoned record carries the injection back into a fresh prompt, where the model has every reason to trust it - the system told the agent that this content came from the user's own past, and the agent is supposed to act on what the user said. The injection now has unlimited shots, and the blast radius is every future session that touches the poisoned record.

This is what the Palo Alto Unit 42 research on indirect prompt injection poisoning long-term memory formalised. Their threat-model paper walks through the case rigorously: the attacker does not need to get into the model's session in real time. They need to get something into a corpus the model will later retrieve from. Email contents, scraped web pages, documents shared into a workspace, content the user pastes for summarisation - any path from external content to the memory store is a potential injection vector. VentureBeat's coverage of the “Comment and Control” exploit against an autonomous code-review agent demonstrated the same pattern in a coding context: a hostile comment in a pull request, ingested by the agent, persisted as a remembered “coding convention” for the repository, and weaponised against future automated reviews. The InjecMEM paper in early 2026 produced an academic-grade evaluation of the attack class.

The trust boundaries that matter

Once you accept that memory is a persistent attack surface, the design problem becomes legible. There are three categories of content the agent encounters, and the agent's behaviour with each must be explicit:

User content. Things the user explicitly typed at the agent in this session. Highest trust, but still data, not instructions.
Tool content. Output of tools the agent called. Medium trust, depending on the tool. Output of a calculator is trustworthy; output of a web fetch is not.
Memory content. Things retrieved from the persistent store. Trust level depends entirely on how the content got into the store in the first place.

The mistake teams make - and the mistake the demonstrated attacks exploit - is treating memory content as if it inherits the user's trust level. The reasoning sounds intuitive: this came from the user's past, so it is the user speaking. The reasoning is wrong, because past is not a unitary thing. Some of the past was the user typing carefully. Some of the past was the user pasting in a third-party document. Some of the past was tool output. Some of the past was content the agent saw on a website. Treating it all as “the user’s trusted past” collapses the trust hierarchy that the architecture is supposed to maintain.

Memory survives sessions; so does whatever was poisoned.

The architectural implication is that memory content is data, not instructions, and the system has to enforce that distinction explicitly. The model that reasons over the retrieved memory must be running in a context where instructions inside the memory are treated as text the user said in the past, not as commands to execute now. This is the same data-versus-instruction split that has been the core of prompt-injection defence for two years, applied at the storage layer rather than the prompt layer.

What a defensible memory API looks like

The threat model produces a checklist of properties a memory product should have. The checklist is not aspirational. It is the floor below which the product cannot be safely used, and any team evaluating a memory system should run it explicitly.

Per-user scoping with no cross-tenant leakage. Each user's memory pool is fully isolated. There is no shared global pool that a query can accidentally hit. The attack of poison one user's memory and contaminate everyone else's is structurally impossible, not just policy-prohibited. This is the most important property and the easiest to fail on if you tack memory onto an existing multi-tenant system without re-thinking the data model.

Input is data, not instructions. The store treats content as text to be indexed and retrieved. The store does not parse it for commands. The store does not run reasoning on it. There is no path inside the memory layer for retrieved content to influence the storage system's own behaviour. Microsoft's security blog on AI recommendation poisoning makes this point in the recommendation-system context, and it generalises directly: the surface that ingests untrusted content cannot also be the surface that acts on it.

No auto-execution from retrieved content. The memory API returns text. It does not return tool calls, executable instructions, or anything the agent runtime would interpret as an action. The decision about whether retrieved content triggers an action lives entirely on the calling agent's side, where the application's threat model can be applied. The store is upstream of those decisions, by design.

Query-side guardrails on the calling side. The agent that consumes retrieved memory has to treat it with the same suspicion it treats any data input. Retrieval results go into a section of the prompt that is structurally marked as “past notes” rather than “current instruction.” The system prompt instructs the model that text inside that section is descriptive context, not commands. This is on the calling agent, not on the memory API, but the API has to make it possible for the calling agent to do the right thing.

Editable, deletable, auditable. The user (or the application acting on the user's behalf) can list memories, view their contents, edit them, and delete them. There is no opaque memory blob that the user cannot inspect. This is both a security property and a regulatory property; the GDPR right to erasure does not negotiate.

What this looks like in the Ragionex Memory API

Each of the architectural properties above maps to a specific feature of the public API surface, and the customer can verify it by reading the API documentation directly. The three properties most worth being explicit about:

Per-user scoping is a hard isolation, not a soft filter. Every memory is owned by a single user_id derived from the API key. A search request never crosses the boundary, because the query is constructed against a per-user index. There is no global query path that a malicious payload could escape into. The architectural choice to scope memory per-user-key is not a configuration; it is the data model.

The store does not run reasoning on customer content. The Ragionex Memory API is a storage and retrieval layer. The reasoning that happens on top of retrieved memories happens in the customer's own LLM, on the customer's own prompt construction. There is no path inside the store where retrieved content gets fed back into a model that drives store-side behaviour, because the store has no model on the retrieval path. This eliminates a whole class of injection attacks that target memory systems with ingestion-time AI processing.

Zero runtime AI calls on the retrieval path means no injection surface at retrieval time. When the agent calls /v1/memory/search, the store performs a fast lookup and returns the matching records. There is no LLM in the loop on the way back to the agent. The injection vector that requires the retrieved content to talk to a model in order to do harm simply does not exist inside the memory API; the retrieved content only ever reaches the customer's own model, on the customer's own side of the trust boundary, where the customer's own defences apply.

This last property matters because most memory systems with sophisticated reranking or query rewriting put an LLM in the retrieval path. That LLM is a target. If you accept that memory is fundamentally retrieval, the simpler architecture - fast lookup, no model on the read path - is the more defensible one. The same property is what we argue for on the documentation side in why we don't call an LLM at query time: the model is a security surface, and keeping it off the read path is a defence-in-depth move, not a performance one.

Where the customer still has work to do

The memory API can do a lot, but it cannot do everything. The calling application still has to handle the trust boundary at the prompt-construction step. Specifically: when the agent retrieves memories and stitches them into a prompt, the memories need to be marked structurally as retrieved past notes rather than mixed in with the system prompt or the user's current instruction. The system prompt has to tell the model to treat content inside that section as descriptive, not directive. Frontier models are increasingly robust at this distinction when it is made explicit, and brittle at it when it is implicit.

Beyond the prompt structure, the application has to decide what gets written into memory in the first place. Auto-ingesting tool output, web pages, or pasted user content into memory without a review step is the most common way the attacks succeed. A short manual review step before content becomes a permanent memory eliminates the largest class of attacks at the cost of a small UX friction. For high-stakes deployments, the trade is worth it. For low-stakes deployments, the threat model needs to be explicit about what the application is willing to ingest.

The honest summary

Persistent memory is a real engineering primitive and a real security primitive simultaneously. The benefits are large and the trade is genuine. The teams that ship memory-backed agents responsibly in 2026 will be the ones that take the second half of that sentence as seriously as the first. The architectural moves are not exotic - per-user scoping, data-not-instructions, no model on the retrieval path, editable and auditable storage, careful prompt construction on the calling side - but they have to be made deliberately. Tacking them on after a security incident is significantly more expensive than building them in.

If you are evaluating a memory product, ask the vendor each item on the checklist above and read their answer carefully. The right answers are short and structural. The wrong answers are long and policy-shaped. The architectural decision about where memory lives determines most of what is possible to defend, and the time to make that decision is before the first piece of customer content lands in the store.

Persistent memory is a persistent attack surface.