Pixel art lobster mascot illustration for email infrastructure — email threading In-Reply-To References headers

email threading explained: how In-Reply-To and References headers keep conversations together

How Message-ID, In-Reply-To, and References headers create email threads. A practical guide for developers and agents building reply chains programmatically.

February 5, 20266 min read

Samuel ChenardCo-founder

Every email you've ever received carries invisible metadata that determines whether it shows up as a new conversation or gets tucked into an existing thread. Three headers do most of the work: Message-ID, In-Reply-To, and References. Get them right, and your replies land exactly where they belong. Get them wrong (or skip them entirely), and your carefully crafted response floats into someone's inbox as a disconnected orphan message.

This matters more than ever if you're building agents that send email programmatically. An agent replying to a customer, coordinating with another agent, or confirming a booking needs its replies threaded correctly. Otherwise, the human on the other end sees a disjointed mess of one-off emails instead of a coherent conversation.

Let's break down how email threading actually works at the header level, how different email clients interpret these headers, and how to set them correctly when sending replies from code or an agent.

The three headers that make threading work#

Every email message gets a unique identifier when it's created. That's the Message-ID header. It looks something like this:

Message-ID: <abc123@mail.example.com>

When someone replies to that email, their mail client does two things automatically. First, it sets the In-Reply-To header to the Message-ID of the email being replied to. Second, it populates the References header with the chain of Message-ID values from the entire conversation history.

Here's what a reply's headers look like in a three-message thread:

Message-ID: <msg003@mail.example.com>
In-Reply-To: <msg002@mail.example.com>
References: <msg001@mail.example.com> <msg002@mail.example.com>

That's the entire mechanism. Message-ID identifies each email. In-Reply-To points to the direct parent. References carries the full ancestry of the thread.

In-Reply-To vs. References: key differences#

Both headers reference Message-ID values, but they serve different purposes. Here's a side-by-side comparison:

	In-Reply-To	References
Purpose	Points to the direct parent message	Carries the full thread ancestry
Value format	Single Message-ID	Space-separated list of Message-IDs
Number of IDs	Always one	One to many (trimmed after ~10)
Required for threading	Not strictly, but strongly recommended	Not strictly, but strongly recommended
How clients use it	Quick parent lookup	Reconstructing full conversation tree
Fallback behavior	Clients may fall back to subject-line matching	Some clients ignore threading entirely without it

In practice, you want both. Some email clients prioritize In-Reply-To for simple parent-child linking. Others rely on References to build the complete tree structure, especially in long conversations with branches.

How email clients handle threading differently#

Here's where it gets interesting. The RFC standard (RFC 5322) defines In-Reply-To and References, but email clients interpret them with their own quirks.

Gmail uses a hybrid approach. It threads messages that share the same References chain, but it also threads messages with matching subject lines sent between the same participants. This means Gmail can sometimes group unrelated emails into the same thread if they happen to share a subject line like "Re: Invoice" between the same two addresses. It's a known frustration, and it's why Gmail's threading feels "too aggressive" to some people.

Outlook relies more heavily on In-Reply-To and the Thread-Index header (a Microsoft-specific extension). Outlook generates a Thread-Index value for the first message in a conversation and appends to it with each reply. If Thread-Index is missing, Outlook falls back to In-Reply-To and References. If those are missing too, it uses subject-line matching as a last resort.

Apple Mail sticks closer to the RFC spec. It uses In-Reply-To and References for threading and generally won't group messages by subject alone. This makes it more predictable but less forgiving if your headers are wrong or missing.

Thunderbird implements the JWZ threading algorithm (named after Jamie Zawinski, who wrote the original Netscape Mail threading code in the 1990s). It builds a tree from References headers first, then uses In-Reply-To to fill gaps. Subject-line matching is a fallback, but it's applied conservatively.

The takeaway: if you set In-Reply-To and References correctly, your messages will thread properly across all major clients. If you skip them, you're gambling on subject-line matching, which is inconsistent.

Setting threading headers programmatically#

When your application or agent sends a reply, you need to construct the headers yourself. Here's the process:

Store the Message-ID of every email you receive. You'll need it to build reply headers later.
Set In-Reply-To to the Message-ID of the email you're replying to.
Build References by copying the References header from the parent email and appending the parent's Message-ID to the end.

In TypeScript, that looks something like this:

function buildReplyHeaders(parentEmail: { messageId: string; references?: string }) {
  const parentRefs = parentEmail.references || '';
  const references = parentRefs
    ? `${parentRefs} ${parentEmail.messageId}`
    : parentEmail.messageId;

  return {
    'In-Reply-To': parentEmail.messageId,
    'References': references,
  };
}

One detail that trips people up: the References header should be trimmed if it grows beyond about ten Message-IDs. The RFC recommends keeping the first ID (the thread root) and the most recent ancestors, dropping IDs from the middle. In practice, most conversations don't hit ten levels deep, but automated systems and multi-agent email coordination can generate long chains fast.

function trimReferences(references: string, maxIds: number = 10): string {
  const ids = references.split(/\s+/).filter(Boolean);
  if (ids.length <= maxIds) return references;

  // Keep the root and the most recent ancestors
  const root = ids[0];
  const recent = ids.slice(-(maxIds - 1));
  return [root, ...recent].join(' ');
}

What happens when headers are missing#

If In-Reply-To and References are both absent, email clients fall back to heuristics. Gmail groups by subject line and participants. Outlook checks Thread-Index. Apple Mail mostly gives up and shows the message as a new conversation.

This is why automated emails so often break threading. A transactional email system fires off a reply without setting the threading headers, and the recipient sees it as a brand-new message. They reply to that, and now there are two threads about the same thing. Multiply by hundreds of customers and you've got an inbox disaster.

The fix is simple: always include both headers when sending a reply. There is no valid reason to omit them.

Threading for agents: the multi-turn problem#

When AI agents handle email conversations, threading becomes even more important. Consider an agent that schedules meetings over email. It might exchange four or five messages with a human: initial request, proposed times, a counter-proposal, confirmation, calendar invite. If any of those messages breaks the thread, the human has to hunt through their inbox to piece the conversation back together.

Agents also face a unique challenge: they often process emails from multiple inboxes simultaneously. If your agent manages ten inboxes, each with active conversations, it needs to track the Message-ID and References chain for every thread independently. Mixing up threading headers between conversations would splice unrelated emails into the same thread on the recipient's end.

With LobsterMail, when your agent receives an email, the messageId and references fields are included in the email object. When the agent sends a reply through inbox.send(), it can pass those values directly:

const emails = await inbox.receive();
const original = emails[0];

await inbox.send({
  to: original.from,
  subject: `Re: ${original.subject}`,
  text: 'Got it, confirmed for Tuesday at 2pm.',
  inReplyTo: original.messageId,
  references: original.references
    ? `${original.references} ${original.messageId}`
    : original.messageId,
});

The agent doesn't need to understand threading theory. It just passes the right values, and every email client on the receiving end threads the conversation correctly.

Security: when threading headers get weaponized#

There's a less-discussed angle to threading headers: they can be spoofed. If an attacker knows the Message-ID of a legitimate email in a thread (which is sometimes exposed in mailing list archives or forwarded messages), they can craft a message with matching In-Reply-To and References headers. The recipient's email client would then slot the attacker's message into the existing thread, making it look like part of a trusted conversation.

This is called thread hijacking, and it's a real vector for phishing. The attacker's message appears inside a thread the recipient already trusts, making them more likely to click a link or share information.

There's no perfect defense at the header level since Message-ID values aren't cryptographically signed in standard email. DKIM signatures cover some headers but not all clients validate them strictly. The practical defense is awareness: if an email appears in a thread but the sender address doesn't match previous participants, treat it with suspicion.

For agents processing inbound email, this is another reason to check sender identity independently rather than trusting thread context alone.

The short version#

Set Message-ID on every outgoing email (your mail library probably does this automatically). When replying, set In-Reply-To to the parent's Message-ID and build References from the parent's References plus the parent's Message-ID. Trim References if it exceeds ten IDs. Do this every time, without exception, and your emails will thread correctly in Gmail, Outlook, Apple Mail, and everything else.

Skip these headers and you'll spend more time debugging broken threads than you spent building the feature.

Give your agent its own email. Get started with LobsterMail -- it's free.

Frequently asked questions

What is the In-Reply-To header in email?

The In-Reply-To header contains the Message-ID of the email being replied to. Email clients use it to link a reply to its parent message and display them in the same thread.

What is the References header and how does it differ from In-Reply-To?

The References header contains a space-separated list of Message-ID values representing the entire ancestry of a conversation. While In-Reply-To points only to the direct parent, References carries the full chain from the thread root to the most recent message.

What is a Message-ID and why is it required for email threading?

A Message-ID is a globally unique identifier assigned to every email when it's sent. It looks like unique-string@domain.com. Without it, there's nothing for In-Reply-To or References to point to, so threading can't work.

How do email clients decide which header takes priority when threading?

It depends on the client. Gmail uses References combined with subject-line matching. Outlook prefers its proprietary Thread-Index header, falling back to In-Reply-To. Apple Mail and Thunderbird follow the RFC spec closely, prioritizing References for tree construction and In-Reply-To for parent lookup.

What happens when both In-Reply-To and References headers are missing?

Most email clients fall back to subject-line matching. Gmail groups messages with the same subject and participants. Apple Mail typically shows the message as a new conversation. The result is inconsistent across clients, which is why you should always include both headers.

Can unrelated emails accidentally end up in the same thread?

Yes, especially in Gmail. If two separate emails share the same subject line (like "Re: Invoice") between the same participants, Gmail may group them into one thread even though they're unrelated. Correct References headers help prevent this.

How many Message-IDs should the References header contain before trimming?

The RFC recommends trimming after about ten Message-IDs. Keep the first ID (the thread root) and the most recent ancestors, dropping entries from the middle of the list.

How do I set In-Reply-To and References headers when sending a reply via API?

Set In-Reply-To to the Message-ID of the email you're replying to. Build References by copying the parent email's References value and appending the parent's Message-ID. Most email SDKs and APIs accept these as header fields on the send call.

How does Gmail's threading algorithm differ from the RFC standard?

Gmail threads by References headers but also groups messages with matching subject lines and participants, even without threading headers. This makes Gmail's threading more aggressive than the RFC spec, sometimes merging unrelated conversations.

How should AI agents manage threading headers across multi-turn email conversations?

Agents should store the Message-ID and References value of every received email. When replying, the agent sets In-Reply-To to the parent's Message-ID and appends it to the parent's References chain. With LobsterMail, these fields are included in the email object so the agent can pass them directly when sending.

Can subject-line matching replace In-Reply-To and References as a threading fallback?

It's unreliable. Subject-line matching works in some clients (Gmail, Outlook) but not others (Apple Mail). It also creates false positives when unrelated emails share a subject. Always use proper threading headers rather than relying on subject matching.

Can threading headers be spoofed for phishing?

Yes. If an attacker knows a legitimate Message-ID, they can forge In-Reply-To and References to inject a message into an existing thread. This is called thread hijacking. DKIM helps but isn't universally enforced, so recipients (and agents) should verify sender identity independently.