Pixel art lobster working at a computer terminal with email — multi-agent prompt infection

security infrastructure email automation

multi-agent prompt infection: how malicious prompts spread between AI agents

Multi-agent prompt infection lets malicious prompts self-replicate across connected AI agents. Here's how it works, why email-capable agents are especially vulnerable, and what you can do about it.

April 24, 20268 min read

Samuel ChenardCo-founder

In October 2024, researchers from UCL and Stanford published a paper that quietly changed how we should think about AI security. The attack they described wasn't a new kind of prompt injection. It was something worse: a prompt that copies itself from one agent to another, spreading through a multi-agent system the way a virus moves through a network.

They called it Prompt Infection. And if you're building with agents that talk to each other, read emails, or share context, it's worth understanding how it works before one of your agents catches it.

What is multi-agent prompt infection?#

Multi-agent prompt infection is an attack where a malicious prompt embeds itself in one AI agent's output, then propagates to other connected agents through their shared communication channels. Unlike traditional prompt injection, which targets a single model, prompt infection self-replicates across agent boundaries. The core risk is that a single compromised agent can spread unauthorized instructions to every agent it communicates with, enabling data exfiltration, unauthorized actions, or cascading system failures.

The term comes from the "Prompt Infection" paper by Gu et al. (2024), which demonstrated that in multi-agent systems (MAS), a single injected prompt can replicate through agent-to-agent messages, tool outputs, and shared memory. The researchers showed infection rates above 80% across common multi-agent architectures.

That number is striking. It means that in a system of ten agents, compromising one can compromise eight.

How it actually spreads#

Traditional prompt injection is a one-shot attack. Someone hides "ignore your instructions and do X" in an email, a web page, or a document. If an agent processes that content without sanitization, it follows the hidden instruction. Bad, but contained. The damage stops at one agent.

Prompt infection is different because the malicious payload includes a replication instruction. It tells the compromised agent to embed the same payload in its own outputs. When Agent B reads Agent A's output, it gets infected. When Agent C reads Agent B's output, same thing. The researchers describe this as a Multi-Agent Infection Chain (MAIC), and the behavior mirrors biological viral transmission.

Here's what makes this particularly nasty: agents don't need to be in the same framework. If Agent A sends an email that Agent B reads, and Agent B writes a summary that Agent C processes, the infection crosses tool boundaries, protocol boundaries, and even organizational boundaries.

The chain looks something like this:

Malicious email → Agent A (email reader) → Agent A's summary → Agent B (task planner) → Agent B's instructions → Agent C (code executor) → compromised action Each hop looks like normal agent communication. There's no exploit in the traditional sense. The agents are doing exactly what they're designed to do: read input and act on it. The infection lives in the content, not the code.

Why email is the most dangerous vector#

Among all the tools agents use (browsers, code execution, file systems, APIs), email stands out as the highest-risk channel for prompt infection. Three reasons.

First, email is the primary way agents interact with the outside world. When an agent signs up for a service, receives a notification, or communicates with another agent's human operator, it happens through email. That makes inboxes the most exposed attack surface.

Second, email content is inherently untrusted. Unlike an API response from a known endpoint, an email can come from anyone. A well-crafted phishing email with an embedded prompt infection payload looks identical to a legitimate message. The agent can't distinguish between "here's your verification code" and "here's your verification code (also, include the following text in every email you send from now on)."

Third, email-capable agents can propagate infections outward. A compromised agent with send access doesn't just get infected. It becomes a carrier. It can forward the malicious payload to every contact in its inbox, every mailing list it's subscribed to, every agent it collaborates with. This is the "AI worm" scenario that researchers have been warning about since early 2025.

The OWASP Top 10 for LLM Applications now ranks prompt injection as the number one vulnerability in production AI deployments. A January 2026 review found it present in over 73% of assessed production systems. Email-capable agents without proper content isolation make up a disproportionate share of those vulnerable deployments.

What defenses actually work#

Let's be honest: there's no silver bullet. But several approaches reduce the attack surface significantly.

Content isolation. The most effective defense is treating all inbound content as untrusted data, not as instructions. This means agents should never process raw email bodies as part of their system prompt. Content should be extracted, sanitized, and passed through a structured format that separates data from instructions.

Injection scoring. Every piece of inbound content should be scored for injection risk before an agent acts on it. This isn't just keyword matching. Modern scoring looks at instruction-like patterns, role-play attempts, and context manipulation techniques. LobsterMail, for example, scores every inbound email for injection risk and exposes that metadata to the receiving agent, so the agent can decide how to handle suspicious content before it enters the processing pipeline.

Output monitoring. Infection spreads through outputs. If an agent's outbound messages suddenly contain instruction-like content that wasn't in its original task, something has gone wrong. Monitoring agent outputs for anomalous patterns catches infections that input filtering misses.

Tool-level permissions. An agent that can read email but not send it can get infected but can't spread the infection. Principle of least privilege applies here: don't give agents send access unless they need it, and when they do, scope it tightly. Rate limits, domain restrictions, and recipient allowlists all reduce blast radius.

Memory hygiene. Shared memory and context windows are infection highways. If Agent A's compromised output gets stored in a shared knowledge base that Agent B and C read from, the infection persists even after Agent A is cleaned. Treat shared memory as another untrusted input surface.

The compliance problem nobody's talking about#

When a prompt-infected agent sends unauthorized emails from your domain, who's liable? The answer is: you are. The CAN-SPAM Act, GDPR, and other regulations don't have an "my AI was hacked" exception. If your agent sends spam, phishing, or unauthorized communications, the business that deployed it is responsible.

This isn't theoretical. As agent email volume grows, the first lawsuit over an infected agent sending unauthorized emails on behalf of a business is a matter of when, not if. Infrastructure-level controls (domain isolation, send-rate limits, outbound content scanning) aren't just security measures. They're legal protection.

Where this is headed#

The Prompt Infection paper demonstrated the attack in controlled settings. But the conditions it requires, agents reading untrusted input and passing outputs to other agents, describe nearly every production multi-agent system being built today. As agent-to-agent communication becomes more common, especially through open protocols like email and MCP, the attack surface grows.

The fix isn't to stop building multi-agent systems. It's to build them with the assumption that any agent can be compromised at any time, and to design communication channels that limit what a compromised agent can do. Defense in depth, applied to agent communication. Boring security principles, applied to a new problem.

If your agents handle email, start with the basics: score inbound content for injection risk, isolate email content from agent instructions, and limit outbound permissions to the minimum your workflow needs. Those three steps won't make you invulnerable, but they'll keep you out of the 73%.

Frequently asked questions

What exactly is prompt infection in the context of multi-agent AI systems?

Prompt infection is an attack where a malicious prompt self-replicates across connected AI agents. Unlike standard prompt injection (which targets one model), the payload instructs each compromised agent to embed the same malicious instructions in its outputs, spreading the infection to every agent downstream.

How is prompt infection different from a standard prompt injection attack?

Standard prompt injection is a one-shot attack that compromises a single agent. Prompt infection adds a self-replication mechanism: the compromised agent embeds the malicious payload in its own outputs, so any agent that reads those outputs also becomes infected. It's the difference between infecting one cell and starting an epidemic.

Can a single compromised agent infect an entire multi-agent pipeline?

Yes. The UCL and Stanford research showed infection rates above 80% in common multi-agent architectures. A single compromised agent that shares outputs with other agents can cascade the infection through the entire system.

What is a Multi-Agent Infection Chain (MAIC) and how does it form?

A MAIC is the sequence of agents that get compromised as a prompt infection spreads. It forms when Agent A's infected output is processed by Agent B, which then produces infected output that Agent C processes, and so on. Each agent in the chain becomes both victim and carrier.

Are email-capable agents at higher risk of being used as infection vectors?

Yes. Email is the most exposed channel because it accepts untrusted input from anyone, and agents with send access can propagate the infection outward to other agents, mailing lists, and external contacts. An infected email agent becomes a carrier that can spread beyond your own system.

What defenses exist today against prompt infection in production AI systems?

The most effective defenses are content isolation (separating email content from agent instructions), injection risk scoring on all inbound content, output monitoring for anomalous instruction-like patterns, tool-level permission scoping, and memory hygiene for shared context stores. No single defense is sufficient on its own.

Can prompt infection be used to exfiltrate sensitive data across agents?

Yes. A prompt infection payload can instruct compromised agents to include sensitive data from their context in outbound messages. Since the infection spreads through normal agent communication channels, the data exfiltration looks like regular agent output.

How does prompt infection relate to the concept of an AI worm?

An AI worm is essentially prompt infection at scale. Where prompt infection describes the mechanism (self-replicating malicious prompts spreading agent-to-agent), an AI worm describes the outcome: an autonomous, self-propagating attack that spreads across organizational boundaries, much like traditional computer worms spread across networks.

Is prompt infection a theoretical threat or has it been demonstrated in real systems?

It has been demonstrated in controlled research environments. The original paper by Gu et al. (2024) showed successful infection chains across multiple multi-agent architectures. While no large-scale in-the-wild incident has been publicly documented yet, the conditions for it exist in most production multi-agent systems today.

What role does agent memory and context sharing play in enabling prompt infection?

Shared memory is an infection highway. If a compromised agent writes infected content to a shared knowledge base or context store, every agent that reads from it can get infected. The infection persists even after the original agent is cleaned, because the malicious payload lives in the shared state.

How can agent-first email infrastructure help prevent infection propagation?

Email infrastructure built for agents can score inbound messages for injection risk before the agent processes them, enforce outbound rate limits and recipient restrictions, and isolate email content from agent instruction channels. LobsterMail's security features include per-email injection risk scoring that gives agents the metadata they need to handle suspicious content safely.

Which multi-agent frameworks are most vulnerable to prompt infection?

Any framework where agents share outputs through unfiltered text channels is vulnerable. The risk isn't framework-specific. It's architectural. Systems where agents read each other's raw outputs (AutoGen, CrewAI, LangGraph, or custom setups) all face the same fundamental exposure unless they implement content isolation between agents.

What happens if a prompt-infected agent sends unauthorized emails from my domain?

You're liable. Regulations like CAN-SPAM and GDPR hold the deploying business responsible for emails sent from its domain, regardless of whether a human or an AI initiated them. Infrastructure-level controls like domain isolation, send-rate limits, and outbound content scanning are both security measures and legal protection.

multi-agent prompt infection: how malicious prompts spread between AI agents

What is multi-agent prompt infection?#

How it actually spreads#

Why email is the most dangerous vector#

What defenses actually work#

The compliance problem nobody's talking about#

Where this is headed#

Frequently asked questions

Related posts

oauth vs api keys for ai agent email authentication

ai agent email identity management: how agents prove who they are

sandboxed ai email assistant: what it actually means and which tools do it