Pixel art lobster working at a computer terminal with email — end to end testing agent email playwright

End to end email testing for AI agents with Playwright

How to test email verification, OTP, and signup flows end-to-end with Playwright using virtual inboxes your agent controls programmatically.

March 24, 20268 min read

Ian BussièresCTO & Co-founder

Your agent signs up for a SaaS tool, triggers a welcome email, extracts the verification link, and clicks it. That's a four-step flow. Testing it end-to-end with Playwright means your test suite needs to actually receive that email, parse it, and hand the data back to the browser session.

Most teams fake this step. They mock the email or skip the verification entirely, then hope nothing changes. The result is a test suite that covers 90% of the user journey and silently ignores the part most likely to break in production.

Playwright handles the browser automation side well. But the moment your test flow depends on a real email arriving at a real inbox, you need something else: an inbox your code can provision and read from programmatically. LobsterMail was built for this kind of agent-driven workflow. If you'd rather skip the comparison and start testing, and follow along with the examples below.

How to test email end-to-end with Playwright#

Provision a virtual test inbox via API or SDK before your test runs.
Inject the inbox address into your Playwright test as the signup email.
Trigger the email action in your app under test (signup, password reset, OTP).
Poll the inbox API or receive a webhook for the incoming message.
Extract the OTP code or verification link from the email body.
Resume the Playwright flow and assert the expected outcome.

Every step after the first is standard Playwright. The hard part is step 1 and step 4: getting a real inbox your test controls programmatically, and getting the email content back before your test times out.

The polling problem#

Most email testing tools work through polling. Your test sends a request every few seconds asking "did the email arrive yet?" This creates two problems.

First, it's slow. If your polling interval is 2 seconds and the email takes 7 seconds to arrive, your test wastes 8 seconds waiting. Multiply that across 50 test cases and you've added over 6 minutes to your CI pipeline for no reason.

Second, it's flaky. Email delivery isn't instant. An SMTP relay might buffer the message for 10 seconds. Your polling loop times out at 15. Works most of the time, fails often enough to erode trust in the entire test suite. Teams start re-running failed tests automatically, which masks real failures instead of fixing them.

A webhook-based approach eliminates both issues. Instead of asking "is it there yet?" repeatedly, the inbox pushes a notification when the email arrives. Your test awaits that signal and continues immediately. Zero wasted time, zero timeout roulette.

MailSlurp vs Mailosaur vs agent-native inboxes#

The three most common tools for Playwright email testing are MailSlurp, Mailosaur, and general-purpose inbox APIs. If your tests are written and run by a human developer, MailSlurp or Mailosaur will work fine. The calculus changes when an AI agent orchestrates the test pipeline.

Criteria	MailSlurp	Mailosaur	LobsterMail
Inbox provisioning	API call, requires account setup	API call, requires account setup	SDK auto-provisions, no human signup
Webhook delivery	Yes	Yes	Yes
Agent self-service	No (human configures API keys)	No (human configures API keys)	Agent creates its own inbox
Injection protection	None	None	Built-in prompt injection scoring
Free tier	Limited	7-day trial	Free forever, 1,000 emails/month
Playwright integration	REST API	REST API + helper library	SDK + REST API + MCP

MailSlurp and Mailosaur are solid tools with years of track record. The difference shows up when tests are orchestrated by an AI agent rather than a human developer. If your CI pipeline is triggered by an agent that needs to provision inboxes and run the full verification cycle without human intervention, the setup overhead of traditional tools becomes a bottleneck. A human has to create the account, generate the API keys, paste them into environment variables. An agent-first inbox API lets the agent handle all of that in a single SDK call.

We explored some of these patterns in our guide to testing agent email without hitting production, which covers sandboxing strategies for agent-driven workflows.

A working example: Playwright + LobsterMail#

Here's a test that signs up for an app, receives the verification email, extracts the code, and completes the flow:

import { test, expect } from '@playwright/test';
import { LobsterMail } from '@lobsterkit/lobstermail';

test('signup verification flow', async ({ page }) => {
  const lm = await LobsterMail.create();
  const inbox = await lm.createInbox();

  // Fill out signup form with test inbox
  await page.goto('https://myapp.com/signup');
  await page.fill('#email', inbox.address);
  await page.fill('#password', 'testpass123');
  await page.click('#submit');

  // Wait for verification email
  const emails = await inbox.receive({ timeout: 30000 });
  expect(emails.length).toBeGreaterThan(0);

  // Extract verification code
  const code = emails[0].text.match(/\d{6}/)?.[0];
  expect(code).toBeTruthy();

  // Complete verification
  await page.fill('#verification-code', code!);
  await page.click('#verify');
  await expect(page.locator('#dashboard')).toBeVisible();
});

Each createInbox() call provisions a fresh lobster-xxxx@lobstermail.ai address. No configuration file, no environment variable pointing to a pre-created inbox. Each test run gets its own isolated address. If you need something human-readable (because your app validates email format or displays the address to the user), use createSmartInbox({ name: 'Test User' }) instead. It generates something like test-user@lobstermail.ai and handles collisions automatically.

Agent-orchestrated test pipelines#

The pattern above works for human-authored Playwright tests. The more interesting case is when an AI agent writes and runs these tests itself.

An agent building a support agent that handles email needs to verify its own integration actually works. It can provision a LobsterMail inbox through MCP or the SDK, write a Playwright test against the app under development, execute it, and interpret the results. The entire cycle happens without a human touching the test file or configuring credentials.

This is where "agent-first" becomes an architecture decision rather than a buzzword. Traditional email testing tools require a human to create an account and manually inject API keys into the test environment. An agent-first inbox API lets the agent handle provisioning, testing, validation, and cleanup in one autonomous loop.

Testing HTML email rendering#

End-to-end email testing goes beyond verification codes. If you need to validate that HTML emails render correctly, Playwright can help there too. Once you've received the email through the inbox API, load its HTML body into a Playwright page context and assert against it:

const htmlContent = emails[0].html;
await page.setContent(htmlContent);
await expect(page.locator('a[href*="verify"]')).toHaveAttribute(
  'href',
  expect.stringContaining('https://myapp.com/verify')
);
This catches broken links, missing images, and layout regressions that unit tests on your email templates would miss entirely.

Configuring Playwright for reliable email tests#

Two configuration properties make the biggest difference for email test reliability.

Set longer timeouts for email-dependent tests. Email delivery adds latency that browser interactions don't have. Use test.setTimeout(60000) for email flows, or create a separate Playwright project with its own timeout:

// playwright.config.ts
export default defineConfig({
  projects: [
    {
      name: 'email-flows',
      testMatch: '**/email/**',
      timeout: 60000,
      retries: 0,
    },
  ],
});

Don't set global retries to paper over timing issues. If emails consistently take longer than your timeout, the fix is a longer timeout or switching to webhooks. Automatic retries just hide the problem while your pipeline burns minutes on each re-run.

Start with a single email-dependent test. Get it passing reliably with real inbox provisioning and real email delivery. Once that works, expand to cover your OTP flows, password resets, and other email-gated journeys. The test suite that actually catches email regressions is the one that tests real email.

Frequently asked questions

What is a virtual inbox and why do Playwright email tests need one?

A virtual inbox is a programmatically controlled email address that your test code can create, read from, and dispose of through an API. Playwright automates the browser but can't receive email on its own. A virtual inbox bridges that gap by giving your test suite a real address that receives real messages.

How do I programmatically receive and read emails inside a Playwright test?

Use an inbox API like LobsterMail, MailSlurp, or Mailosaur. Provision an inbox before the test, use its address in your signup form, then call the API's receive method to fetch delivered messages. Parse the email body for codes, links, or content you need to continue the test flow.

What is the difference between MailSlurp, Mailosaur, and Mailinator for Playwright testing?

MailSlurp and Mailosaur provide private virtual inboxes with APIs for receiving and parsing emails. Mailinator offers public inboxes that anyone can read, which makes it unsuitable for tests involving sensitive data or unique verification tokens. For agent-driven test pipelines, LobsterMail's self-provisioning removes the human setup step entirely.

Can I test OTP and one-time-password email flows end-to-end with Playwright?

Yes. Trigger the OTP flow in the browser, then call your inbox API to receive the email. Extract the code with a regex like /\d{6}/ and type it into the verification field using Playwright. Set a timeout of at least 30 seconds to account for email delivery latency.

How do I extract a verification link from an email body in a Playwright test?

Fetch the email through your inbox API and parse the HTML or plain text body. For HTML emails, load the content into a Playwright page using page.setContent() and query for the link element. For plain text, a URL regex like /https?:\/\/[^\s]+/ usually works.

How do I prevent test emails from hurting my sender deliverability score?

Use dedicated test inboxes on a separate domain from your production sending. LobsterMail's @lobstermail.ai addresses are isolated from your production domain, so test volume and bounce rates won't affect your sender reputation. See our guide on testing agent email without hitting production for more strategies.

How does Playwright MCP let AI coding agents run email tests?

Playwright MCP exposes browser automation as tools an AI agent can call directly. Combined with an email API that also supports MCP (like LobsterMail), an agent can provision an inbox, run a Playwright test, receive the verification email, and validate the outcome in a single autonomous loop with no human steps.

What makes an email API agent-first compared to traditional virtual inbox tools?

Agent-first means the API was designed for autonomous software, not humans clicking through dashboards. In practice: auto-signup without human intervention, no manual API key configuration, and built-in protections like prompt injection scoring on incoming emails.

How do I handle email arrival timeouts and flaky retries in Playwright tests?

Set explicit timeouts on your inbox polling, around 30 to 60 seconds for most transactional email. Avoid using Playwright's global retry setting to mask timing issues. If emails consistently arrive late, switch to webhook-based delivery or investigate your email sending pipeline's latency.

How do I run Playwright email tests in CI/CD without a live mail server?

You don't need a mail server. Use a virtual inbox API that provisions addresses on demand. Your CI job creates a fresh inbox at the start of each test run, uses it for signup or verification flows, and tears it down after. No SMTP server to maintain, no DNS records to configure.

Can a single AI agent trigger, intercept, and validate a full email verification flow?

Yes. An agent with access to both Playwright (for browser automation) and an email API (for inbox management) can handle the entire flow: fill out a signup form, wait for the verification email, extract the code, and complete the verification. LobsterMail's MCP integration makes this a single-tool-chain workflow.

How do I test HTML email rendering with Playwright?

Receive the email through your inbox API, then load its HTML content into a Playwright page using page.setContent(). From there you can assert on links, images, layout elements, and text content using standard Playwright selectors and expectations.