Illustration for AI agent inbox compromise: an incident response playbook

AI agent inbox compromise: an incident response playbook

When your agent's inbox gets compromised, existing IR playbooks don't apply. Here's a five-stage response process built for autonomous agents.

8 min read

In late February, a senior security researcher at Meta named Yue described how her OpenClaw agent ran amok in her personal inbox. It read emails she hadn't asked it to read, triggered actions she hadn't authorized, and left a cleanup problem that took days to resolve. Her working explanation was that the volume of data in her real inbox triggered some kind of compaction behavior. That might be part of it. But her inbox also contained thousands of emails from strangers, vendors, newsletters, and services — any of which could have delivered instructions the agent decided to follow.

This is what AI agent inbox compromise looks like. It's not someone logging in with stolen credentials. It's the agent receiving what it interprets as legitimate instructions and executing them.

No existing incident response playbook covers this. They assume a human attacker with a stolen password. They don't account for an agent that can be directed through the exact channel it was built to use. Here's a playbook that does.

Why this is structurally different#

Normal email compromise: an attacker gets read access, maybe exfiltrates data, sends phishing emails from your account. Contained, and your existing IR process probably handles it.

Agent inbox compromise flips the model. The attacker doesn't need system access. They send a crafted email. The agent receives it, parses the content, and starts doing things. Calling APIs. Writing to databases. Forwarding threads. Actions it was genuinely built to take, just in response to instructions from the wrong person.

Severity scales with permissions. An agent that only reads email is annoying to compromise. One that reads email, writes to your CRM, and can execute code is a serious problem.

Cisco's State of AI Security 2026 found that only 29% of organizations felt prepared to secure agentic deployments. Unit 42's Global Incident Response Report the same year noted that identity loopholes drove nearly 90% of their investigated breaches. For agents, the identity problem is harder than for humans: there's no intuition, no hesitation, no "this feels off." The agent follows the instructions.

For a deeper look at how these attacks work mechanically, OpenClaw agent email security is worth reading before running this playbook.

Indicators of compromise#

You're looking for anomalies in two places: the inbox and the agent's behavior.

Inbox-side signals:

  • Outbound volume spikes, particularly to recipients the agent doesn't normally contact
  • Emails from new senders containing structured text that resembles instructions
  • Rapid arrivals from the same domain in a short window
  • Threads that don't map to any active workflow

Agent-side signals:

  • Actions that can't be traced to an explicit user request
  • API calls to services outside the agent's normal scope
  • Rate limit alerts on send operations
  • Downstream triggers firing at unusual hours

The complicating factor: most of these signals are invisible without logging already in place. If you don't have a time-stamped audit trail of what the agent receives and sends, you're doing incident response from memory.

Stage 1: Detect (before the incident)#

This stage happens before anything goes wrong. Set up audit logging now.

At minimum, you want a record of every email received, every email sent, and every agent action that correlates with an email event. This is how you reconstruct the attack timeline. Without it, the assess stage becomes guesswork.

Alerts worth configuring:

  • Outbound volume exceeding your normal daily max
  • First contact from a domain with no prior history in the inbox
  • Emails containing keyword patterns your agent is trained to act on, arriving from unknown senders

If you're using webhooks for real-time delivery, they're also your fastest path to anomaly detection. Webhooks vs polling for agent email covers the tradeoffs in more detail.

Stage 2: Isolate#

When anomalous behavior appears, the first thing you revoke is outbound send permissions. Not the agent's access to everything, and not the receive token yet.

You stop the bleeding first. Revoking send access means the agent can no longer forward, reply, or deliver anything externally.

You leave receive open for forensic reasons. You want to see what's still arriving. Follow-up emails from the attacker reveal whether the attack is ongoing and show you their full pattern.

Freeze downstream integrations the compromised inbox might trigger: CRM syncs, calendar writes, code execution paths, anything that could amplify the damage while you assess.

If your email infrastructure gives each inbox an independent token, you can revoke send access for that specific inbox without interrupting the rest of the agent's work. LobsterMail issues per-inbox tokens for exactly this reason.

Stage 3: Assess#

Export the full email history for the window before the anomalous behavior began. Your job is to correlate agent actions with specific emails.

Three questions to answer:

  1. Which email arrived just before the unexpected behavior started?
  2. Does that email contain text that would explain the actions taken?
  3. What permissions did the agent exercise during the anomalous window?

If you have injection risk scoring on incoming emails, this step is a data lookup. LobsterMail returns an injection risk flag with every received email. Without that, you're manually reviewing potentially hundreds of messages.

Stage 4: Recover#

Don't restore the compromised inbox. The attacker knows that address. Provision a new one, migrate only the threads you need, and let the old address expire.

Update every service pointing to the compromised address. This is the painful step, because you may have registered that address with vendors, SaaS platforms, and form submissions over months. Keep a running list of where each agent inbox is registered so recovery is a checklist, not a discovery exercise.

Stage 5: Harden#

Three changes that reduce the blast radius next time:

Separate inboxes for separate contexts. One address for signup flows, one for vendor communication, one for user replies. If any one gets compromised, the others are unaffected and still operational.

Apply least privilege to the agent's permissions. An agent that only needs to read from one inbox and write to one CRM object is far less dangerous to compromise than one with broad API access.

Add explicit injection filtering before email content reaches the agent's context window. Either implement the check yourself or use infrastructure that handles it automatically.

The structural argument for isolated inboxes#

Yue's situation was particularly messy because her agent had access to her real inbox. Thousands of emails, years of personal and professional context, all in one place. When something went wrong, the scope of what might have been affected was enormous.

An agent with its own isolated inbox, its own shell, carries a completely different risk profile. The attacker gets the contents of that inbox. Your personal email, calendar, and contact list are untouched. Revoking and replacing a dedicated agent inbox is a two-minute operation. Recovering from an agent that had full access to your primary account is not.

The security risks of sharing your inbox with an agent go deeper on the structural argument if you're still on the fence about the architecture decision.

Frequently asked questions

What is prompt injection in email, and how does it affect agents?

Prompt injection is when an attacker embeds instructions in content the agent reads — in this case, email. The agent treats those instructions as legitimate and acts on them. It's the same mechanism as telling an agent "ignore your previous instructions," but delivered through an incoming message rather than direct input.

How do I know if my agent's inbox has been compromised?

Look for two things: outbound send volume spikes or unexpected recipients in the inbox, and agent actions that can't be traced to an explicit user instruction in the agent's behavior. Audit logging makes this straightforward. Without it, you're reconstructing events from memory.

What's the difference between prompt injection and ordinary spam?

Spam is unsolicited bulk email targeting a human reader. Prompt injection is a targeted attack that exploits the agent's tendency to follow instructions in its context window. Injection emails often look like legitimate messages — invoices, support replies, notifications — with embedded instruction text that only matters if an agent is reading them.

Does LobsterMail detect prompt injection automatically?

Yes. LobsterMail returns an injection risk score with every received email. Your agent can check this score before processing the content and decide whether to act, quarantine, or flag for human review. See the getting started guide for implementation details.

Can I use the same email address for multiple agents?

Technically yes, but you shouldn't. Shared inboxes mean a compromise affecting one agent's workflow can contaminate another's, and you lose the ability to isolate and revoke at the inbox level. One inbox per agent context is the right architecture.

How do I revoke an inbox token without shutting down the whole agent?

Each LobsterMail inbox has its own independent token. You can revoke the send token for a specific inbox through the dashboard or API without touching other inboxes or the agent's broader operations. This is the primary reason per-inbox tokens matter for incident response.

Should my agent use a @lobstermail.ai address or a custom domain?

Either works for incident response purposes. A custom domain looks more professional in outbound email, but the isolation properties are identical. The important thing is that it's a dedicated address the agent controls — not your personal or company inbox. Custom domains are available on paid plans.

How often should I rotate agent inbox addresses?

Rotate after any anomalous event, at minimum. Some teams rotate on a monthly schedule for high-exposure agents that interact with many external parties. LobsterMail provisions new addresses in seconds, so the operational cost of rotation is low.

What should I log to support incident response?

At minimum: timestamps for every email received and sent, sender and recipient for each, the injection risk score for inbound emails, and a record of which agent actions followed which email events. Structured logs you can query by time window make the assess stage significantly faster.

Is a compromised agent inbox covered by standard cyber insurance?

Most policies don't have clear language for autonomous agent incidents yet. Coverage typically depends on whether the compromise resulted in data exfiltration, financial loss, or third-party harm — the same triggers as traditional email compromise. Document your IR procedures proactively; insurers are starting to ask about agentic AI controls.

Can an attacker target my agent's inbox without knowing the address?

Realistically, no — but the address can be harvested if your agent sends email to external parties, registers with public services, or appears in a data breach. This is another argument for rotating addresses after high-exposure workflows and not reusing the same address across different contexts.

Does this playbook apply to agents using the MCP server rather than the SDK?

Yes. The playbook addresses the inbox as an attack surface, not a specific implementation. Whether your agent accesses email through the LobsterMail SDK, the MCP server, or the API directly, the detect-isolate-assess-recover-harden sequence applies the same way.


Your agent's inbox is its attack surface. Give it an isolated one. Get started with LobsterMail — it's free.

Related posts