Pixel art lobster working at a computer terminal with email — rate limiting algorithms email API agents

rate limiting algorithms for email API agents: a practical comparison

Compare fixed window, sliding window, token bucket, and leaky bucket algorithms for AI email agents. Learn which fits your sending patterns.

April 13, 20269 min read

Ian BussièresCTO & Co-founder

Your AI agent just hit a 429 status code. It retried immediately. Then again. Fifty concurrent instances did the same thing at the same moment, and now your API key is suspended for the next hour.

This is the rate limiting problem most guides skip past. Traditional documentation assumes a human developer is on the other end, someone who reads the error, waits, and adjusts. Agents don't do that. They retry at full concurrency unless something explicitly stops them.

If your agent sends email through an API, understanding how rate limiting algorithms work isn't optional. The algorithm your email provider uses determines whether your agent's traffic gets shaped gracefully or cut off entirely. And the algorithm you implement on your own side determines whether your sender reputation survives the first week.

Rate limiting algorithms at a glance#

Rate limiting controls how many requests a client can make to an API within a given time period. For email APIs, exceeding your limits damages your sender reputation with ISPs like Gmail and Outlook, and recovering can take weeks.

Algorithm	Core mechanism	Burst handling	Email API fit	Complexity
Fixed window	Counts requests per fixed time interval	Poor (boundary spikes)	Low	Simple
Sliding window	Rolling count across a trailing interval	Good (even distribution)	High	Moderate
Token bucket	Tokens refill at a steady rate; each request costs one	Excellent (controlled bursts)	High	Moderate
Leaky bucket	Requests queue and drain at a constant rate	None (strict pacing)	Medium	Simple

Each algorithm makes different tradeoffs between burst tolerance and implementation cost. For email agents operating at high concurrency, those tradeoffs have real consequences for deliverability.

Fixed window: simple but dangerous at the edges#

Fixed window is the most common rate limiting algorithm in production APIs. It divides time into discrete intervals (say, one minute) and counts requests within each window. Hit the limit, and you're blocked until the next window starts.

The problem shows up at boundaries. An agent that sends 100 emails in the last two seconds of one window and 100 more in the first two seconds of the next has technically stayed under a 100-per-minute limit while blasting 200 emails in four seconds. ISPs don't care about your window boundaries. They see a sudden spike from your IP and throttle you.

For email APIs, this boundary spike problem creates exactly the kind of irregular sending pattern that triggers deliverability issues. Multiple agent instances behind one API key make the effect worse, because each instance independently races toward the same shared limit without coordinating timing.

The appeal of fixed window is that it's dead simple to implement: one counter, one reset timestamp. For non-email use cases with lower reputation stakes, that simplicity has real value. For email, the risk outweighs the convenience.

Sliding window: smoothing the spikes#

The sliding window algorithm fixes the boundary problem by computing request counts over a trailing time period instead of fixed intervals. Rather than resetting at the top of each minute, it looks at the last 60 seconds continuously.

This distributes traffic more evenly, which is what email sending needs. ISPs reward consistent, predictable volume from a given IP. A sliding window naturally prevents the burst-at-boundary pattern that fixed windows allow.

The tradeoff is implementation complexity. Tracking per-request timestamps (or using weighted approximations between adjacent windows) requires more memory than a simple counter reset. For a single agent, this barely matters. For a fleet of agents sharing distributed state across regions, the coordination cost is real.

Redis's CL.THROTTLE command (from the Redis Cell module) implements a distributed sliding window variant that handles clock skew between nodes, making it a popular choice for multi-instance email agent deployments. A single atomic command returns both the current limit status and recommended retry timing, so there's no race condition between "check quota" and "decrement quota."

Token bucket: built for bursty agents#

The token bucket algorithm works on a simple model: a bucket fills with tokens at a fixed rate (say, 10 per second). Each API request consumes one token. If the bucket is empty, the request is rejected or queued. The bucket has a maximum capacity that sets the burst ceiling.

This is the best fit for most AI email agents.

Email agents don't send at constant rates. They send in bursts. An agent monitoring a support inbox might sit idle for hours, then fire off 30 replies in 90 seconds when a batch of tickets arrives. Token bucket accommodates those bursts (up to the bucket capacity) without rejecting requests, while still enforcing a sustainable average rate over time.

For email APIs, you tune two parameters. The fill rate (messages per second on average) controls your sustainable throughput. Set it by dividing your daily send quota by 86,400 seconds. The bucket size (maximum burst before throttling) determines how much burst your ISP reputation can absorb without triggering spam filters. A bucket size of 2-5x the fill rate handles typical agent burst patterns without ISP pushback.

Most modern email APIs, including SendGrid, Postmark, and Resend, use some variant of token bucket internally for per-customer rate limiting.

Leaky bucket: predictable but rigid#

The leaky bucket is the inverse of token bucket. Incoming requests fill a queue (the bucket), and the queue drains at a constant rate. If the bucket overflows, requests are dropped.

The result is perfectly steady output. No bursts. Every request is spaced evenly.

For general API traffic, this can be ideal. For email agents, it's too rigid. An agent that needs to send 20 verification emails in quick succession (because it just signed up for 20 services) will have most of those requests queued and delayed. Time-sensitive emails like verification codes become useless when they arrive 45 seconds late because a leaky bucket was smoothing the traffic.

Leaky bucket works well for background processes like newsletter delivery where timing flexibility exists. For transactional agent email, token bucket is almost always the better choice.

The layer above your API: ISP throttling#

Most rate limiting guides stop at the API level. For email, that's only half the picture.

Every major inbox provider runs its own inbound throttling. Gmail limits concurrent SMTP connections per sending IP. Outlook imposes per-domain hourly caps. Yahoo uses reputation-weighted dynamic limits that shift based on your engagement metrics.

These ISP-level constraints exist independently of your email API's rate limiter. You could stay well under your API quota and still get denied by Gmail because you sent 500 emails to Gmail addresses from a cold IP in 10 minutes.

Agents struggle with this layer especially. An agent doesn't instinctively know that sending 200 emails to Gmail requires warming up over days. It sees an open API, a valid key, no errors. It sends at full speed. By the time 4xx SMTP responses start appearing, the reputation damage is done.

Proper email infrastructure needs to handle both layers: your own API rate limits and the receiving ISP's throttling. These are separate systems with separate state, and most self-built solutions only address the first.

Why agent-first infrastructure handles this differently#

AI agents are, by nature, bad at implementing their own rate limiting.

A human developer hitting a 429 response will check the Retry-After header, wait, and try again. An agent with 50 concurrent threads hitting a 429 will spawn 50 retry attempts after the minimum delay, creating a thundering herd that immediately triggers another 429. Exponential backoff with jitter helps, but the agent (or its framework) has to implement that logic correctly. Most don't.

The better approach is pushing rate limiting into the infrastructure so the agent never encounters it. Instead of exposing raw send endpoints and hoping the agent respects limits, the infrastructure absorbs bursts, manages queues, coordinates across instances, and handles ISP-level pacing transparently.

This is what agent-first email infrastructure is designed around. At LobsterMail, rate limiting and deliverability management happen below the API surface. Your agent calls send() and the infrastructure handles queuing, burst absorption, per-ISP pacing, and retry management. The agent doesn't see a 429 because it doesn't need to. If you want your agent sending email without building custom throttle logic, .

Picking the right algorithm#

If you're building your own email sending pipeline (and some of you will, which is fine), token bucket is the right starting point for transactional agent email. It handles bursts while enforcing sustainable averages. Sliding window makes more sense when you need distributed coordination across multiple agent instances sharing quotas through Redis or a similar store. Avoid fixed window for anything reputation-sensitive; the boundary spike problem is real and ISPs will penalize you for it. Reserve leaky bucket for non-time-sensitive bulk sends like digests or weekly roundups.

Whatever algorithm you choose, remember that your API rate limiter is only the first layer. ISP throttling, IP warming, per-recipient limits, and domain reputation all sit above it. The best rate limiting implementation in the world won't save your deliverability if you're blasting a cold IP at Gmail.

Build for both layers, or use infrastructure that already does.

Frequently asked questions

What is the difference between token bucket and leaky bucket when sending email through an API?

Token bucket allows controlled bursts up to a maximum capacity, then enforces a steady refill rate. Leaky bucket enforces a constant output rate with no bursts at all. For transactional email agents that send in unpredictable spikes, token bucket is the better fit because it accommodates bursts without rejecting time-sensitive messages.

Which rate limiting algorithm works best for AI agents that send email in unpredictable bursts?

Token bucket. It lets agents burst up to the bucket capacity (handling sudden spikes like batch replies or verification emails) while still enforcing a sustainable average rate over time.

How does the sliding window algorithm prevent quota exhaustion better than fixed window for email sending?

Fixed window resets its counter at interval boundaries, letting an agent exhaust its full quota at the end of one window and again at the start of the next. Sliding window counts requests over a continuously rolling period, preventing this boundary exploit and distributing traffic evenly.

What HTTP status code indicates rate limiting, and what headers carry retry timing?

A 429 Too Many Requests status code signals rate limiting. Most APIs include a Retry-After header (in seconds or as an HTTP date) telling the client when to try again. Some also return X-RateLimit-Remaining and X-RateLimit-Reset headers with quota details.

How do you tune token bucket parameters for a transactional email agent?

Set the fill rate by dividing your daily send quota by 86,400 seconds to get a sustainable per-second average. Set the bucket size to 2-5x that rate to allow reasonable burst capacity without triggering ISP throttling on your sending IP.

What is the difference between rate limiting and throttling in email APIs?

Rate limiting rejects or queues requests that exceed a defined quota, returning a 429 error. Throttling slows down request processing without rejecting it, often by introducing artificial delays. Many email APIs use both: rate limiting at the API gateway and throttling at the SMTP delivery layer.

How should an email agent implement exponential backoff after a 429 response?

Wait for a base interval (e.g., 1 second), then double it on each consecutive 429, adding random jitter to prevent multiple agent instances from retrying at the same moment. Cap the maximum wait at 60-120 seconds. If the response includes a Retry-After header, use that value instead.

Why does fixed window rate limiting create boundary spikes that hurt email deliverability?

An agent can send its full quota in the last seconds of one window and the full quota again in the first seconds of the next, producing a burst twice the intended rate in a short span. ISPs like Gmail interpret sudden volume spikes as spam behavior and may throttle or block your sending IP.

How does Redis Cell (CL.THROTTLE) help with distributed rate limiting for email agents?

CL.THROTTLE implements a generic cell rate algorithm as a single atomic Redis command. It handles clock skew between distributed nodes and returns both the current limit status and recommended retry timing, making it well-suited for multi-instance agent fleets sharing a single send quota.

What ISP-level throttling exists above API rate limits when sending email?

Gmail, Outlook, and Yahoo all impose their own inbound connection and volume limits per sending IP and domain. These operate independently of your API quotas. You can stay under your API rate limit and still get throttled by Gmail for sending too much too fast from a cold or low-reputation IP.

When should an email agent use a queue instead of inline rate limiting?

Queue-based sending works better when your agent generates email faster than it can deliver, when you need to retry failed sends later, or when multiple agent instances share a common send quota. A queue decouples composing an email from delivering it, giving the delivery layer full control over pacing.

How does agent-first email infrastructure handle rate limiting transparently?

Agent-first infrastructure like LobsterMail absorbs bursts, manages delivery queues, coordinates across agent instances, and paces sends per ISP reputation, all below the API surface. The agent calls send() and never encounters a 429 because the infrastructure handles throttling internally.

What is rate limiting in email marketing?

Rate limiting in email marketing controls how many emails a platform or sender can deliver within a given time period. It prevents overwhelming recipient servers, protects sender reputation with ISPs, and helps maintain consistent deliverability across campaigns.