Structured Data Extraction
Use AI to extract contacts, dates, amounts, scheduling, and actions from emails.
Last updated 2026-03-29
LobsterMail uses AI to extract structured data from email content and PDF attachments. This turns unstructured email text into machine-readable JSON with contacts, dates, monetary amounts, scheduling information, and action items.
Extraction Types#
| Category | What's extracted | Example |
|---|---|---|
| Contacts | Name, email, phone, role, organization | { name: "Jane Doe", email: "jane@acme.com", role: "Account Manager" } |
| Dates | ISO 8601 dates with labels | { value: "2025-03-15", label: "Invoice due date", isEstimate: false } |
| Amounts | Monetary values with currency | { value: 149.99, currency: "USD", label: "Monthly subscription" } |
| Scheduling | Events, meetings, appointments | { eventType: "meeting", startTime: "...", location: "Zoom", attendees: [...] } |
| Actions | Tasks, links, deadlines | { type: "verify", description: "Confirm your email", url: "https://..." } |
On-Demand Extraction#
Trigger extraction for a specific email (Tier 1+):
# Trigger extraction
curl -X POST https://api.lobstermail.ai/v1/inboxes/{inboxId}/emails/{emailId}/extract \
-H "Authorization: Bearer $TOKEN"
# Check result (may be pending/processing initially)
curl https://api.lobstermail.ai/v1/inboxes/{inboxId}/emails/{emailId}/extraction \
-H "Authorization: Bearer $TOKEN"
The extraction runs asynchronously. Poll the GET endpoint until status is completed or failed.
Auto-Extraction#
Enable automatic extraction on every inbound email for an inbox (Tier 2+ Builder/Pro/Scale):
curl -X PATCH https://api.lobstermail.ai/v1/inboxes/{inboxId} \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"autoExtract": true}'
When enabled, extraction triggers automatically alongside the security scan on every inbound email. If the account falls below Tier 2, auto-extraction silently stops.
Attachment Support#
PDF attachments are automatically parsed and included in the extraction context. Text is extracted from PDFs and fed to the AI model alongside the email body.
Limitations:
- Only PDF attachments are supported in V1 (images and other formats are ignored)
- Encrypted or image-only PDFs cannot be parsed
- Total content is truncated to 10,000 characters for the AI prompt
Tier Requirements#
| Feature | Minimum Tier |
|---|---|
| On-demand extraction | Tier 1 (Free Verified) |
| Auto-extraction | Tier 2 (Builder) |
Idempotency#
Calling the extract endpoint multiple times for the same email returns the existing extraction record. Each email can have at most one extraction.