Launch-Free 3 months Builder plan-
Pixel art lobster working at a computer terminal with email — agent email environment management dev staging production

agent email across dev, staging, and production: a practical guide

How to configure agent email behavior across dev, staging, and production environments without leaking test messages to real users.

9 min read
Ian Bussières
Ian BussièresCTO & Co-founder

A developer I know shipped an agent to staging last month. It worked perfectly: created inboxes, composed messages, parsed replies. Then someone merged the staging branch into production without updating environment variables. The agent fired off 340 onboarding emails from a test domain to real customers. Support tickets rolled in within minutes.

This is not an edge case. When agents handle their own email, the boundary between "safe testing" and "live communication" is one misconfigured variable. Human developers rarely send hundreds of emails by accident. Agents do it in seconds.

The fix isn't complicated, but it does require thinking about email configuration at every stage of your pipeline. Here's how to set it up so your agent sends the right emails, to the right people, in the right environment.

Agent email settings by environment at a glance#

Before getting into specifics, this table covers the key configuration differences between environments. If you're looking for a quick reference, start here.

SettingDevStagingProduction
Email sendingDisabled or local onlySandboxed (test mode)Live delivery
API credentialsTest keysTest-mode keysProduction keys
Sender domainlocalhost or dev subdomainstaging.yourapp.comyourapp.com
Logging levelVerbose (all payloads)Moderate (metadata + errors)Errors and anomalies only
Rate limitsNone (or 5/hour cap)Low cap (50/hour)Enforced per plan tier
RecipientsInternal addresses onlyAllowlisted test accountsReal users

Every decision in this table exists to prevent one thing: your agent talking to the wrong people at the wrong time.

Dev: no emails leave the building#

In development, your agent should never send a real email. Full stop. The goal is to validate logic (template rendering, address selection, retry behavior) without any external side effects.

There are two practical approaches:

Disable outbound sending entirely. Set an environment variable like EMAIL_ENABLED=false and wrap your agent's send calls in a check. Log the full email payload locally so you can inspect what would have been sent.

if (process.env.EMAIL_ENABLED !== 'true') {
  console.log('[DEV] Email suppressed:', { to, subject, body });
  return { status: 'suppressed', environment: 'dev' };
}
**Use a local catch-all.** Tools like Mailpit or MailHog intercept all outbound SMTP traffic and display it in a local web UI. Your agent thinks it's sending real email. Nothing actually leaves your machine. This is better for testing the full send path, including MIME encoding and attachment handling.

Either way, dev credentials should be completely isolated. Use a separate API token scoped to a dev-only account. If your agent provisions its own inboxes (which it should), those dev inboxes exist in a sandboxed namespace that can't accidentally route to production.

If you're already testing agent email without hitting production, you know how much grief this prevents.

Staging: production-shaped, but contained#

Staging is where things get tricky. You want your agent to behave exactly like it will in production, including sending actual emails over real SMTP connections. But you don't want those emails reaching real users.

The solution is test-mode sending with an explicit recipient allowlist.

const ALLOWED_STAGING_RECIPIENTS = [
  'qa-team@yourcompany.com',
  'staging-catchall@yourcompany.com',
];

function validateRecipient(to: string): boolean {
  if (process.env.NODE_ENV === 'staging') {
    return ALLOWED_STAGING_RECIPIENTS.some(
      (allowed) => to.endsWith(allowed) || to === allowed
    );
  }
  return true;
}

This is your safety net. Even if the agent decides to email ceo@bigcustomer.com because it pulled that address from a staging database seed, the allowlist catches it.

Beyond recipient filtering, staging should also enforce lower rate limits than production. An agent that can send 10,000 emails per day in production should be capped at 50 per hour in staging. This prevents runaway loops from consuming quota or triggering abuse alerts with your email provider.

Use a separate sending domain for staging. Something like staging.yourapp.com with its own SPF and DKIM records. This protects your production domain's reputation. If your staging agent accidentally sends malformed emails or triggers spam reports, it won't affect your main domain's deliverability score. For agents that need custom domains for agent email: send from your own domain, keep staging and production domains completely separate.

Replaying production events in staging#

One pattern that's underused: replaying real production email events in staging for regression testing. Capture the inbound webhook payloads from production (stripped of PII or using anonymized copies), then feed them into your staging agent. This lets you verify that a code change doesn't break parsing logic, reply handling, or extraction behavior, all without touching real users.

// Replay a captured production event in staging
const productionEvent = await loadCapturedEvent('event-abc123');
const anonymized = anonymizePayload(productionEvent);
await stagingAgent.processInboundEmail(anonymized);

This catches regressions that synthetic test data misses, like unusual character encodings, unexpected headers, or edge-case MIME structures that only show up in real-world email.

Production: guardrails, not prayers#

Production is where your agent communicates with real humans. The configuration here needs to protect both your users and your sending reputation.

Credential isolation is non-negotiable. Production API keys should live in a secrets manager (not .env files committed to a repo). Rotate them on a schedule. The agent's production token should have the minimum permissions needed: send from specific domains, access specific inboxes, nothing else.

Rate limits should match your plan and your actual needs. If your agent processes 200 customer interactions per day, a 10,000 email/day cap is reasonable headroom. A 100,000/day cap with no alerting is asking for trouble. Set alerts at 50% and 80% of your daily limit.

Monitor deliverability actively. Track bounce rates, spam complaint rates, and delivery latency. Your agent won't notice that 30% of its emails are landing in spam. You need dashboards and alerts for that.

// Log every production email with enough context for auditing
logger.info('email_sent', {
  environment: 'production',
  messageId,
  to: hashEmail(recipient), // don't log raw addresses
  subject,
  templateId,
  timestamp: Date.now(),
  sendLatencyMs,
});

Compliance across environments#

Unsubscribe handling, CAN-SPAM compliance, and GDPR requirements only matter in production, right? Wrong. Your staging environment should mirror production compliance behavior. If your agent doesn't include unsubscribe headers in staging, you won't catch the bug where they're missing until real users start reporting you. Test the compliance path, not just the happy path.

Environment variables: the glue that holds it together#

All of this configuration comes down to environment variables. Here's a practical set for an agent email pipeline:


# Shared across environments
LOBSTERMAIL_INBOX_PREFIX=myagent

# Dev
EMAIL_ENABLED=false
LOBSTERMAIL_API_TOKEN=lm_sk_test_dev_xxxxx
EMAIL_LOG_LEVEL=verbose
EMAIL_RATE_LIMIT=5

# Staging
EMAIL_ENABLED=true
LOBSTERMAIL_API_TOKEN=lm_sk_test_staging_xxxxx
EMAIL_LOG_LEVEL=moderate
EMAIL_RATE_LIMIT=50
EMAIL_RECIPIENT_ALLOWLIST=qa-team@yourcompany.com,staging-catchall@yourcompany.com
SENDER_DOMAIN=staging.yourapp.com

# Production
EMAIL_ENABLED=true
LOBSTERMAIL_API_TOKEN=lm_sk_live_xxxxx
EMAIL_LOG_LEVEL=error
EMAIL_RATE_LIMIT=10000
SENDER_DOMAIN=yourapp.com

Never share tokens across environments. A staging token that works in production is a bug, not a convenience.

Feature flags for email rollouts#

When your agent gets a new email capability (say, sending follow-up sequences), don't ship it to production for all users at once. Use feature flags to gate the behavior per environment and per audience segment.

const canSendFollowups = featureFlags.isEnabled('agent-followup-emails', {
  environment: process.env.NODE_ENV,
  userId: currentUser.id,
});

if (canSendFollowups) {
  await agent.scheduleFollowup(thread);
}

This lets you enable the feature in dev and staging first, then roll it out to 5% of production users, then 50%, then everyone. If something goes wrong, you kill the flag. No deploy needed.

Promoting changes across environments#

The safest promotion flow is one-directional: dev to staging to production, never skipping a step.

Before promoting email configuration from staging to production, verify three things:

  1. Deliverability in staging is clean. No bounces from misconfigured DNS. No spam folder placement from missing authentication records.
  2. Rate limits are appropriate. Staging limits are artificially low. Production limits should match your actual sending volume plus headroom.
  3. Credentials are environment-specific. Double-check that staging tokens aren't accidentally in the production config. This is the number one cause of "it worked in staging" disasters.

Automate this verification in your CI/CD pipeline. A pre-deploy check that confirms the API token prefix matches the target environment (lm_sk_test_ for staging, lm_sk_live_ for production) catches the most common mistake.

Where LobsterMail fits#

LobsterMail handles a lot of this automatically. When your agent provisions an inbox through the SDK, the token type determines the environment behavior. Test tokens (lm_sk_test_*) create sandboxed inboxes. Live tokens (lm_sk_live_*) create production inboxes with full deliverability. Your agent doesn't need to know which environment it's in. The token handles routing.

For teams running agents across multiple environments, this means less configuration code and fewer opportunities to misconfigure. The free tier (1,000 emails/month) is enough for dev and staging combined. The Builder tier ($9/month) covers most production workloads with up to 5,000 emails/month and 10 inboxes.

If you want your agent to handle its own email across environments without building this plumbing yourself, and let the token system manage the rest.

Frequently asked questions

Should AI agents send real emails in a development environment?

No. In dev, either disable outbound email entirely or use a local catch-all like Mailpit. Your agent's email logic should be testable without anything leaving your machine.

How do you configure an email agent to use a sandbox in dev but live sending in production?

Use environment variables to control behavior. Set EMAIL_ENABLED=false in dev, use a recipient allowlist in staging, and use production API tokens with live sending enabled in production. The token type determines the routing.

What environment variables are needed to manage agent email across dev, staging, and production?

At minimum: EMAIL_ENABLED, LOBSTERMAIL_API_TOKEN (scoped per environment), EMAIL_RATE_LIMIT, SENDER_DOMAIN, and EMAIL_LOG_LEVEL. Staging should also have an EMAIL_RECIPIENT_ALLOWLIST.

How can you test an agent's email logic end-to-end without delivering to real recipients?

Use a local SMTP interceptor (Mailpit, MailHog) in dev. In staging, enforce a recipient allowlist that only permits internal test addresses. Both approaches let the agent run its full send path while keeping real users safe.

How do you prevent an autonomous agent from accidentally emailing production users during a staging run?

Enforce a recipient allowlist in staging that rejects any address not on the approved list. Also use staging-specific API tokens and a separate sending domain so that even if the allowlist fails, the emails come from a staging domain with no production access.

Should staging and production use separate sending domains for agent emails?

Yes. Use something like staging.yourapp.com with its own DNS authentication records. This isolates your production domain's sender reputation from any staging mishaps, bounces, or spam complaints.

How do you log and audit every email sent by an agent across all environments?

Log each send event with the environment name, a hashed recipient address (not raw PII), message ID, template ID, and timestamp. Use verbose logging in dev, moderate in staging, and errors-only in production. Store audit logs in a centralized system.

What rate limits should govern agent email sending in dev versus production?

Dev should have a very low cap (5 per hour) or be disabled entirely. Staging should use a moderate cap (50 per hour). Production limits should match your plan tier with alerting at 50% and 80% thresholds.

How do feature flags interact with agent-driven email workflows?

Use feature flags to gate new email capabilities (like follow-up sequences or new templates) per environment. Enable in dev first, then staging, then roll out to a percentage of production users. Kill the flag if something breaks.

How do you replay a production email event in staging without affecting real users?

Capture inbound webhook payloads from production, strip or anonymize PII, then feed the anonymized payloads into your staging agent's processing pipeline. This catches parsing regressions that synthetic test data misses.

Can agent-generated emails be diff-tested between staging and production?

Yes. Render the same email template with the same input data in both environments, then compare the output. Automated diff testing catches content regressions, missing variables, and formatting issues before they reach real recipients.

What delivery monitoring should be in place for agent email in production?

Track bounce rates, spam complaint rates, delivery latency, and inbox placement. Set alerts for anomalies. Your agent won't notice degraded deliverability on its own, so dashboards and automated alerts are required.

How do you handle unsubscribes and compliance when an agent sends emails across multiple environments?

Mirror production compliance behavior in staging. Include unsubscribe headers, honor opt-out lists, and test CAN-SPAM and GDPR flows in staging so you catch missing compliance features before production deployment.

What is the difference between staging and production environments for email?

Staging mirrors production behavior (real SMTP connections, authentication, compliance) but sends only to allowlisted test recipients with lower rate limits. Production sends to real users with full rate limits and active deliverability monitoring.

Is LobsterMail free for dev and staging environments?

Yes. The free tier includes 1,000 emails per month, which is more than enough for development and staging combined. Test tokens (lm_sk_test_*) create sandboxed inboxes automatically.

Related posts