
agent email platform data model: the database schema behind every inbox
How agent email platforms structure their databases, from core message tables to agent-specific action logs, and why the schema looks different from traditional email.
Most email database schema guides start with the same Stack Overflow answer from 2010: a messages table, a users table, maybe a folders table. That design works fine for a human inbox. It falls apart the moment an AI agent needs to provision its own address, poll for verification codes, and log every action it took along the way.
An agent email platform needs a different data model. Not radically different (email is still email), but extended in ways that traditional schemas never considered. The agent isn't browsing a folder view. It's running a workflow. The schema has to support that.
Here's what the core tables look like when you're building (or evaluating) an agent-first email platform.
Core tables in an agent email platform database schema#
| Table Name | Purpose | Key Fields |
|---|---|---|
accounts | Workspace or tenant identity | id, plan, created_at, token_hash |
inboxes | Individual email addresses tied to an account | id, account_id, address, display_name, created_at |
messages | Every email sent or received | id, inbox_id, direction, subject, body_text, body_html, from_address, created_at |
message_recipients | Per-recipient delivery tracking (to, cc, bcc) | id, message_id, address, type, delivery_status |
attachments | File metadata for message attachments | id, message_id, filename, content_type, size_bytes, storage_key |
threads | Conversation grouping via references and subject | id, inbox_id, subject_normalized, last_message_at |
agent_actions | Log of every action the agent performed | id, inbox_id, action_type, message_id, metadata, created_at |
delivery_events | Status lifecycle for outbound messages | id, message_id, event_type, timestamp, detail |
An agent email platform data model is a relational schema designed for programmatic access rather than human UI navigation. It prioritizes fast polling, per-inbox isolation, and auditability of agent behavior over folder hierarchies and star/unstar toggles.
Message and recipient design#
The messages table is the backbone. But the real workhorse is message_recipients. In a traditional email client, CC and BCC are just header strings you parse and display. In an agent platform schema, they need to be normalized into rows because the agent may need to query by recipient, filter by delivery status, or fan out actions per recipient.
CREATE TABLE message_recipients (
id BIGINT PRIMARY KEY,
message_id BIGINT NOT NULL REFERENCES messages(id),
address TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN ('to', 'cc', 'bcc')),
delivery_status TEXT DEFAULT 'pending'
);
Each recipient gets its own row with its own delivery status. When your agent sends a message to five addresses, you get five `message_recipients` rows, each independently trackable. This is what makes it possible to answer "did the confirmation email actually reach `alice@example.com`?" without parsing raw SMTP logs.
BCC recipients exist in this table but are excluded from any recipient list returned to other parties. The schema knows about them; the API surface doesn't expose them to anyone except the sender.
## Threading without a human
Email threading is notoriously messy. The `In-Reply-To` and `References` headers are supposed to chain messages together, but plenty of mail servers mangle them. Gmail uses its own internal thread IDs. Outlook sometimes starts new threads on replies.
For an agent email platform, threading matters because agents often need to follow a conversation: send a signup email, wait for a verification reply, extract the code, then continue. If the reply doesn't land in the same thread, the agent loses context.
A practical approach uses two signals: the `References` header chain (when available) and a normalized subject match as a fallback. The `threads` table groups messages by inbox and subject, with a `last_message_at` timestamp for sorting. When a new message arrives, the platform checks `References` first. If that finds a match, it joins the existing thread. If not, it falls back to subject matching within the same inbox. If neither works, it starts a new thread.
This isn't perfect. But it's reliable enough for agent workflows where the agent initiated the conversation and controls one side of the thread. The cases where threading breaks tend to involve forwarded messages or mailing lists, which aren't common in agent-to-service communication.
## Agent action logs: the table traditional schemas don't have
Here's where agent email platforms diverge most from traditional designs. A standard email schema tracks messages. An agent email platform also tracks what the agent *did* with those messages.
The `agent_actions` table records every meaningful operation: inbox creation, message send, message read, code extraction, webhook trigger, retry attempt. Each row ties back to an inbox and optionally to a specific message.
```sql
CREATE TABLE agent_actions (
id BIGINT PRIMARY KEY,
inbox_id BIGINT NOT NULL REFERENCES inboxes(id),
action_type TEXT NOT NULL,
message_id BIGINT REFERENCES messages(id),
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
The `metadata` column (JSONB in Postgres, or a document in NoSQL) holds action-specific details: the extracted verification code, the parsed structured output, the retry count, the error reason. This is where the schema absorbs the unpredictable variety of agent behavior without requiring a new column for every use case.
Why does this matter? Debugging. When an agent workflow fails at step 4 of 7, you need to see what happened at steps 1 through 3. Traditional email schemas can tell you what messages exist. Agent action logs tell you what the agent tried to do and where it stopped.
If you're evaluating platforms, this is a good litmus test. Can the platform show you the agent's action history for a given inbox? If the answer involves grepping application logs, the schema wasn't designed for agents. For context on what "agent-first" really means here, [why the agent should create its own inbox](/blog/agent-self-signup-explained) covers the provisioning side of this same idea.
## Delivery events and the outbox pattern
Outbound email isn't instant. Your agent calls `send()`, but the message passes through queuing, SMTP handshake, recipient server acceptance, and possible bounce processing. The `delivery_events` table tracks this lifecycle.
A common schema pattern here is the outbox: messages start in a `pending` state, a background worker picks them up for SMTP delivery, and each state transition gets recorded as an event. The agent (or a webhook) can check delivery status without polling the external mail server.
```sql
CREATE TABLE delivery_events (
id BIGINT PRIMARY KEY,
message_id BIGINT NOT NULL REFERENCES messages(id),
event_type TEXT NOT NULL, -- 'queued', 'sent', 'delivered', 'bounced', 'failed'
timestamp TIMESTAMPTZ DEFAULT now(),
detail TEXT
);
This event-driven approach decouples the agent's "send" action from the actual SMTP delivery. The agent doesn't block waiting for SMTP. It sends, gets a message ID back, and can check status later. For agents running multiple workflows in parallel, this is the difference between a responsive system and one that hangs on every outbound call.
## Multi-tenant isolation
Agent email platforms serve many accounts, each with their own inboxes. The schema needs to enforce isolation so that Account A never sees Account B's messages, even if a query bug or permission error occurs.
The simplest approach: every table that holds tenant-specific data includes an `account_id` column, and every query includes an `account_id` filter. Row-level security in Postgres can enforce this at the database layer, not just the application layer.
For platforms running at scale, this is also where indexing decisions matter most. A composite index on `(account_id, inbox_id, created_at)` on the `messages` table handles the most common agent query pattern: "give me recent messages for this inbox in this account." Without that index, a table with millions of rows across thousands of tenants gets slow fast.
When comparing self-hosted agent email versus managed platforms, multi-tenant isolation is one of the hardest things to get right on your own. It's not just a schema question. It's a schema plus permissions plus query discipline question.
## SQL vs. NoSQL for agent email
This comes up a lot. The short answer: relational databases (Postgres, MySQL) are the better default for email platform schemas. Email has inherent relational structure (messages belong to inboxes, inboxes belong to accounts, recipients belong to messages). Joins are useful. Foreign keys prevent orphaned data. Transactions keep multi-table writes consistent.
Document databases (MongoDB, DynamoDB) can work, especially for the `metadata` or `agent_actions` tables where the shape of data varies per action type. But modeling the core email tables as documents leads to denormalization headaches: you end up duplicating recipient data across documents, and updating delivery status means finding and patching nested arrays.
A hybrid approach works well in practice. Relational tables for the core email model. A JSONB column (in Postgres) or a sidecar document store for flexible agent metadata. You get the structure where you need it and the flexibility where the agent's behavior is unpredictable.
## What to look for when evaluating platforms
If you're building on top of an agent email platform rather than designing your own schema, you can't see the database directly. But you can infer schema quality from the API:
- Does the API return per-recipient delivery status, or just "sent"?
- Can you query messages by thread?
- Is there an action log or audit trail for agent operations?
- Are inbox operations scoped to your account with no leakage?
- Can you get delivery events (queued, sent, delivered, bounced) for outbound messages?
If the API exposes these cleanly, the schema underneath is probably well-designed. If it doesn't, the platform either hasn't built the tables or hasn't exposed them, and either way your agent is flying blind.
LobsterMail exposes per-inbox message queries, delivery metadata, and injection risk scoring on received emails. If you want to test what a well-structured agent email API feels like from the agent's side, the free tier gives you a working inbox with no credit card.
<FAQ>
<FAQItem question="What core tables must every email platform database schema include?">
At minimum: `accounts`, `inboxes`, `messages`, `message_recipients`, and `attachments`. Agent platforms also need `agent_actions` and `delivery_events` to track what the agent did and what happened to outbound messages.
</FAQItem>
<FAQItem question="How should CC and BCC recipients be stored in a relational email schema?">
Normalize them into a `message_recipients` table with a `type` column (`to`, `cc`, `bcc`). Each recipient gets its own row, which allows per-recipient delivery tracking and clean queries by address.
</FAQItem>
<FAQItem question="What is the message_recipients table and why is it important?">
It's the join table between messages and email addresses. It stores each recipient as a separate row with a type (to, cc, bcc) and delivery status. Without it, you're parsing comma-separated header strings, which makes querying and tracking delivery per recipient nearly impossible.
</FAQItem>
<FAQItem question="How do you model email threads in a database schema?">
Use a `threads` table that groups messages by `In-Reply-To`/`References` headers first, with a fallback to normalized subject matching within the same inbox. Store a `last_message_at` timestamp for sorting.
</FAQItem>
<FAQItem question="What fields should an attachments table include?">
`id`, `message_id` (foreign key), `filename`, `content_type` (MIME type), `size_bytes`, and a `storage_key` pointing to the actual file in object storage. Don't store file contents directly in the database.
</FAQItem>
<FAQItem question="How does an AI agent interact with an email platform's data model?">
Through an API layer that maps to the underlying schema. The agent creates inboxes, sends messages, polls for new mail, and reads delivery status. Well-designed platforms also let the agent query by thread and check action history.
</FAQItem>
<FAQItem question="What additional tables does an agent email platform need beyond a standard email schema?">
An `agent_actions` table for logging every operation the agent performs, and a `delivery_events` table for tracking the lifecycle of outbound messages. These enable debugging and auditability that traditional email schemas lack.
</FAQItem>
<FAQItem question="Should an agent email platform use SQL or NoSQL?">
Relational (SQL) databases are the stronger default because email data is inherently relational. Use JSONB columns or a sidecar document store for flexible metadata like agent action details, where the data shape varies per action type.
</FAQItem>
<FAQItem question="How do you design a multi-tenant email database schema that isolates workspace data?">
Include an `account_id` column on every tenant-specific table and enforce it in every query. Postgres row-level security can add database-layer enforcement. Composite indexes on `(account_id, inbox_id, created_at)` keep queries fast across tenants.
</FAQItem>
<FAQItem question="What indexes are critical for performance in a high-volume email schema?">
Composite indexes on `(inbox_id, created_at)` for message polling, `(message_id, type)` on `message_recipients` for delivery lookups, and `(account_id, inbox_id)` on most tables for tenant-scoped queries.
</FAQItem>
<FAQItem question="What is the outbox pattern in an agent email platform?">
Messages start in a `pending` state in the database. A background worker picks them up, delivers via SMTP, and records status transitions as `delivery_events`. This decouples the agent's send call from actual delivery, so the agent doesn't block on SMTP.
</FAQItem>
<FAQItem question="How do you store agent action logs alongside email records?">
In an `agent_actions` table with foreign keys to `inboxes` and optionally `messages`. A JSONB `metadata` column holds action-specific details (extracted codes, error reasons, retry counts) without requiring schema changes per action type.
</FAQItem>
<FAQItem question="What is the difference between an email schema and a messaging schema?">
Email schemas handle SMTP-specific concerns: MIME types, headers like `In-Reply-To`, delivery status codes (550, 421), and bounce processing. Messaging schemas (chat, SMS) are simpler because delivery is usually instant and threading is built into the protocol.
</FAQItem>
<FAQItem question="Is LobsterMail free to use?">
Yes. The free tier includes one inbox and 1,000 emails per month with no credit card required. Your agent can [self-provision an inbox](/blog/agent-self-signup-explained) and start sending and receiving immediately.
</FAQItem>
</FAQ>


