
queue-based email architecture for AI agents: Redis, SQS, and when to skip both
Redis and SQS are the go-to options for queue-based agent email. Here's how they compare, what you'll build around them, and when to skip both.
Your agent needs to send 50 emails in the next ten minutes. It fires off API calls in a tight loop, and the first 30 go through. Then the SMTP server starts returning 429s. The agent retries immediately, gets throttled harder, and by the time things settle, 14 messages are gone with no record of what failed.
This is the default failure mode when agents handle email without a queue. It's why most production agent systems end up building queue-based email architecture, whether that was part of the original plan or not.
Why agents need queues for email#
Humans don't need message queues because humans are slow. One email at a time, naturally under rate limits. Agents work differently. They batch and parallelize without pausing to read the server response before firing the next request.
Email providers enforce rate limits at multiple levels: per-second send rates, hourly quotas, daily caps, per-recipient throttling. An agent that ignores these constraints gets temporarily blocked (4xx errors) or permanently banned (5xx errors). A message queue sits between the agent and the email provider, controlling the flow rate and handling failures instead of dropping messages silently.
Rate limiting is just the starting point, though. Queues also give you retry logic (failed sends go back into the queue with exponential backoff instead of vanishing) and priority control (verification emails jump ahead of batch reports). The question isn't whether your agent needs a queue. It's which queue, and how much surrounding infrastructure you're willing to maintain.
Redis as your queue backend#
Redis is the most popular backing store for agent email queues. It's fast, familiar, and libraries like BullMQ handle most of the plumbing for you.
A basic BullMQ setup for email sending looks like this:
import { Queue, Worker } from 'bullmq';
const emailQueue = new Queue('agent-email', {
connection: { host: 'localhost', port: 6379 }
});
// Producer: agent adds emails to the queue
await emailQueue.add('send', {
to: 'recipient@example.com',
subject: 'Verification code',
body: 'Your code is 847291'
}, {
attempts: 3,
backoff: { type: 'exponential', delay: 5000 }
});
// Consumer: worker sends at a controlled rate
const worker = new Worker('agent-email', async (job) => {
await sendEmail(job.data);
}, {
connection: { host: 'localhost', port: 6379 },
limiter: { max: 10, duration: 1000 }
});
About 20 lines gets you retry logic, rate limiting, priority scheduling, and job persistence. BullMQ also supports delayed jobs and dead letter queues (DLQs) for messages that exhaust all retries.
The speed advantage is real. Redis queues process jobs in low single-digit milliseconds, which matters when your agent needs confirmation that an email was queued before it moves to the next step in a workflow.
The persistence story is less reassuring. Redis stores data in memory by default. If your instance crashes without RDB snapshots or AOF persistence enabled, every queued email disappears. You configure persistence explicitly, and even with it, you might lose the last few seconds of writes during an unclean shutdown.
Then there's operational overhead. You're running a Redis server (or cluster). That means monitoring memory usage, handling failover, managing connection pools, and keeping versions updated. For one agent on a side project, this is manageable. For ten agents across staging and production, it starts to feel like running a second product.
If BullMQ feels heavy, rsmq offers a lighter alternative. It implements a basic message queue on top of Redis with visibility timeouts and deduplication, but without built-in rate limiting, scheduling, or dead letter queues. You'll write more application code to fill those gaps.
SQS as your queue backend#
Amazon SQS takes the opposite approach. No server to run, no memory to watch, no failover to think about. It's a fully managed service that scales with your workload.
import { SQSClient, SendMessageCommand, ReceiveMessageCommand } from '@aws-sdk/client-sqs';
const sqs = new SQSClient({ region: 'us-east-1' });
const queueUrl = 'https://sqs.us-east-1.amazonaws.com/123456/agent-email';
// Producer: agent adds email to SQS
await sqs.send(new SendMessageCommand({
QueueUrl: queueUrl,
MessageBody: JSON.stringify({
to: 'recipient@example.com',
subject: 'Verification code',
body: 'Your code is 847291'
})
}));
// Consumer: poll and process
const response = await sqs.send(new ReceiveMessageCommand({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20
}));
Messages survive infrastructure failures because they're replicated across multiple availability zones. You don't lose emails when a server reboots. SQS also handles dead letter queues natively through redrive policies, so you don't have to build that yourself.
The latency is higher, typically 10-50ms per operation. But your SMTP server is almost certainly slower than that, so this rarely matters for email workloads in practice.
Where SQS gets complicated is rate limiting. It delivers messages as fast as your consumer polls them. You need to implement throttling in your worker code, or use a Lambda concurrency limit as a blunt instrument. This is where most SQS-based email architectures start accumulating custom code.
Ordering is another wrinkle. Standard SQS queues don't guarantee message order. If your agent sends a welcome email followed by a verification code, they might arrive in reverse. FIFO queues fix this but cap throughput at 300 messages per second (3,000 with batching), and they cost slightly more.
On pricing, SQS charges $0.40 per million requests after the free tier. For most agent workloads, this rounds to effectively zero. But aggressive polling with short wait times racks up empty-receive charges even when the queue has nothing in it. Long polling (the WaitTimeSeconds parameter) helps.
Everything you build around the queue#
Choosing between Redis and SQS is the easy decision. The hard part is everything else.
When an email fails all retry attempts, it needs a destination. A dead letter queue catches these failures, but you also need tooling to inspect what went wrong and a process for deciding whether to retry manually or discard. Ignore this, and phantom failures accumulate for weeks before anyone notices a pattern.
Idempotency is another trap that's easy to overlook. If your worker crashes after sending an email but before acknowledging the queue message, the queue redelivers it. Without a deduplication check (typically a Redis key or database row keyed on message ID), your recipient gets the same email twice. Or three times. This is especially painful for transactional emails where duplicate verification codes confuse users and trigger fraud alerts.
Backpressure matters when your email provider goes down for an hour. Messages pile up in the queue. When the provider recovers, your worker tries to drain the backlog at full speed and immediately gets rate-limited again. You need adaptive throttling that responds to provider health signals, not just a fixed rate ceiling. This is one of those problems that never appears in local testing but shows up reliably in production.
Monitoring takes longer to set up than the queue itself. Queue depth, processing latency, error rates, DLQ size, consumer lag. Without dashboards and alerts on these metrics, you're guessing at system health. Most developers skip this step and regret it the first time a queue backs up silently on a Friday evening.
And if your agent manages multiple email addresses, each inbox probably has its own rate limits and sender reputation score. A single queue won't cut it. You need per-inbox queues, or at least per-inbox rate limiting within a shared queue, which adds routing logic and configuration for every address your agent controls.
A rough estimate: the queue itself takes a day to wire up. The surrounding infrastructure takes a week or more, and that's before the edge cases that only appear in production.
When to skip the queue entirely#
Not every agent needs a custom queue architecture. The question worth asking: what is your agent actually doing with email?
If it sends transactional messages at moderate volume (verification codes, notifications, periodic reports), the queue exists to handle retries and rate limiting. Those are real problems, but they're solved problems. You're rebuilding plumbing that managed email APIs already handle internally.
LobsterMail takes this approach. Your agent pinches its own inbox with a single SDK call, and the send/receive pipeline handles queuing, rate limiting, and retries behind the API.
import { LobsterMail } from '@lobsterkit/lobstermail';
const lm = await LobsterMail.create();
const inbox = await lm.createSmartInbox({ name: 'My Agent' });
await inbox.send({
to: 'recipient@example.com',
subject: 'Verification code',
body: 'Your code is 847291'
});
No Redis to manage. No SQS to configure. No dead letter queue to monitor. The sending docs cover rate limits and formatting details if you want to see how it works under the hood.
The tradeoff is control. If you need custom priority queues, complex multi-step routing, or tight integration with your own event pipeline, building on Redis or SQS gives you that flexibility. If you need an agent that can email and you'd rather spend engineering time on the agent itself, managed infrastructure makes more sense.
For agents handling high-volume workflows with specific ordering guarantees and complex retry policies, build the queue. For the other 90% of agent email use cases, it's over-engineering a problem that's already been solved.
If your agent just needs an inbox and the ability to send and receive, and skip the queue architecture entirely.
Frequently asked questions
What is queue-based email architecture?
It's a pattern where outbound emails are placed into a message queue (like Redis or SQS) before being sent, rather than calling the SMTP server directly. The queue controls send rate, handles retries on failure, and prevents message loss during provider outages.
Should I use Redis or SQS for my agent's email queue?
Redis (via BullMQ) is better when you need sub-millisecond latency and built-in rate limiting. SQS is better when you want zero infrastructure management and guaranteed durability. For most agent email workloads, either works. The real cost is the code you build around the queue, not the queue itself.
Can I use RabbitMQ instead of Redis or SQS for agent email?
Yes. RabbitMQ supports priority queues, dead letter exchanges, and fine-grained routing out of the box. It's heavier to operate than Redis and less managed than SQS, so it tends to work best in teams that already run RabbitMQ for other services.
How do I prevent duplicate emails when using a message queue?
Use idempotency keys. Before sending, check whether a message with that ID has already been processed (typically via a Redis key or database row). This prevents duplicates when the queue redelivers a message after a worker crash.
What happens to queued emails when my SMTP provider goes down?
Messages accumulate in the queue until the provider recovers. The risk is that your worker tries to drain the backlog too fast and gets rate-limited again. Implement adaptive backoff that checks provider health before increasing throughput.
What is a dead letter queue and do I need one?
A dead letter queue (DLQ) captures messages that fail after all retry attempts. Without one, permanently failed emails vanish silently. You need a DLQ for any production email system, plus a way to inspect and act on the messages that land there.
How fast can BullMQ process email jobs?
BullMQ on Redis processes jobs in low single-digit milliseconds. The bottleneck is almost always your SMTP provider's accept rate, not the queue itself. A limiter config in BullMQ lets you match your provider's rate limits exactly.
Does Amazon SQS guarantee message ordering for emails?
Standard SQS queues do not guarantee order. FIFO queues do, but they cap throughput at 300 messages per second (3,000 with batching). For most agent email workloads, standard queues with application-level sequencing are sufficient.
Do I need a queue if my agent sends fewer than 100 emails per day?
Probably not. At that volume, direct API calls with basic retry logic (a simple try/catch with a delay) are enough. Queues become valuable when you're sending hundreds or thousands of emails per day, or when message loss is unacceptable.
Can LobsterMail replace my entire email queue setup?
For transactional and moderate-volume agent email, yes. LobsterMail handles queuing, rate limiting, retries, and delivery internally. You call inbox.send() and the infrastructure handles the rest. If you need custom routing logic or deep event pipeline integration, you'll still want your own queue.
Is BullMQ better than rsmq for agent email queues?
BullMQ is more feature-complete: it includes rate limiting, job scheduling, priorities, and dead letter queues. rsmq is lighter but requires you to build those features yourself. For email specifically, BullMQ's built-in rate limiter makes it the better choice unless you need minimal dependencies.
How much does it cost to run an email queue on SQS?
SQS charges $0.40 per million requests after the free tier (1 million requests/month). For an agent sending 1,000 emails per day with long polling, you'll use roughly 30,000-60,000 requests per month, which falls well within the free tier.


