
email threading headers explained: Message-ID, In-Reply-To, and References
How Message-ID, In-Reply-To, and References headers work together to keep email threads intact across Gmail, Outlook, and Apple Mail.
Your agent sent a reply. The recipient saw it as a brand-new email, sitting alone in their inbox with zero context. No thread, no conversation history, just a standalone message that made the whole interaction look broken.
The problem was three missing headers. Email threading relies on three RFC 5322 headers working together: Message-ID uniquely identifies each email, In-Reply-To points to the direct parent message, and References stores the full ancestor chain of the thread. When all three are present and correct, email clients reconstruct conversations into coherent threads. When any of them are missing or malformed, threads fall apart.
| Header | Role | Value format | Cardinality |
|---|---|---|---|
| Message-ID | Unique identifier for this message | Unique string + sender domain | One per message |
| In-Reply-To | Points to the direct parent message | Parent's Message-ID value | One value |
| References | Full ancestor chain of the thread | Space-separated list of Message-IDs | One or more values |
That table covers the basics. Here's how each header actually works, how major email clients interpret them differently, and what this means when your agent (or your application) is sending email programmatically.
What Message-ID does#
Every outgoing email gets a Message-ID header. It's a globally unique identifier generated by the sending mail server at the moment of dispatch. The format defined in RFC 5322 is a local part, an @ symbol, and a domain. In the raw email source, it looks like this:
Message-ID: <abc123.456789@mail.example.com>
The sending server is responsible for generating this value. If you're sending through a transactional email API (Postmark, Mailgun, LobsterMail), the service generates it for you unless you explicitly provide one. The only hard rule: it must be globally unique. Two messages sharing the same Message-ID will confuse threading in unpredictable ways.
Most implementations combine a timestamp, a random string, and the sending domain to guarantee uniqueness. Something like 1712188800.a7f3b2c1d4e5@mail.yourapp.com is typical. The domain portion should match your actual sending domain. A mismatched domain in Message-IDs can trigger spam filters, since it looks like the sender is trying to obscure the origin.
In-Reply-To points to the parent#
When you reply to a message, your email client (or sending API) sets the In-Reply-To header to the Message-ID of the message being replied to. One value, pointing to one parent.
In-Reply-To: <abc123.456789@mail.example.com>
This is how email clients know a message is a reply and not a new thread. If you're building automated email workflows, setting In-Reply-To correctly is the single most important thing you can do for threading. Without it, every outbound message starts a fresh conversation in the recipient's inbox.
References holds the full thread history#
The References header contains a space-separated list of Message-IDs representing the entire ancestor chain of the conversation. The first ID is the original message that started the thread. The last ID is the message being directly replied to (same value as In-Reply-To).
References: <original@example.com> <reply1@example.com> <reply2@example.com>
As a thread grows, References accumulates every Message-ID from the conversation. RFC 5322 recommends keeping the full list but doesn't impose a hard size limit. In practice, some servers truncate it beyond a few kilobytes, though most threads never get that long.
The difference between In-Reply-To and References comes down to scope. In-Reply-To points to one parent. References maps the full lineage. Both matter: In-Reply-To lets a client build parent-child relationships, while References lets it reconstruct the entire conversation tree even when some messages are missing from the local mailbox.
How Gmail, Outlook, and Apple Mail thread differently#
Here's where the theory meets reality. Each email client has its own threading logic, and they don't all agree.
Gmail groups messages primarily by subject line matching. If two emails share the same subject (after stripping Re: and Fwd: prefixes), Gmail threads them together, even without matching In-Reply-To or References headers. This means Gmail will sometimes merge unrelated messages that happen to share a subject. It also means changing the subject line in a reply will split the thread, even if the References header is perfect.
Outlook takes the opposite approach. It relies on In-Reply-To and References for thread membership and falls back to Thread-Index and Thread-Topic (proprietary Exchange headers) for messages between Outlook systems. Subject matching is a last resort. This explains why a reply might thread correctly in Outlook but show up as a separate conversation in Gmail, or the other way around.
Apple Mail follows the RFC most closely. It uses In-Reply-To and References as the primary threading mechanism and only falls back to subject matching when headers are entirely absent.
The practical takeaway: if you want threads to work across all three, set all the headers and keep the subject line consistent. Belt and suspenders.
Threading in agent and programmatic email workflows#
Most threading guides assume a human is clicking "Reply" in a mail client, where the client handles headers automatically. When an AI agent sends email programmatically, nothing is automatic. Your code is responsible for storing Message-IDs and setting the right headers on every outbound message.
Here's what a properly threaded programmatic reply looks like:
// Agent receives an email and replies in-thread
const received = await inbox.receive();
const original = received[0];
await inbox.send({
to: original.from,
subject: `Re: ${original.subject}`,
text: "Thanks for reaching out. Here's the information you requested...",
inReplyTo: original.messageId,
references: [
...(original.references || []),
original.messageId
]
});
Three things to get right:
- Store the Message-ID from every received email. You'll need it when the agent replies.
- Set In-Reply-To to the Message-ID of the specific message being replied to.
- Build References by taking the References array from the received message and appending its Message-ID to the end.
If your agent is scheduling meetings over email, threading keeps the back-and-forth coherent. Without it, each agent message lands as a separate conversation in the recipient's inbox. That's confusing for the human on the other end and makes the whole interaction feel disjointed.
This gets more complex in multi-agent email workflows where multiple agents participate in the same thread. Each agent needs access to the thread's References chain. If Agent A starts a conversation and Agent B needs to continue it, B needs the Message-IDs from A's exchange. Shared state or a message broker between agents solves this, but you have to design for it up front.
Debugging broken threads#
When threading breaks, diagnosis follows a predictable path.
Start by checking for Message-ID presence. Open the raw email source (in Gmail: "Show original"; in Outlook: "View message source"). Every message in the thread should have a unique Message-ID header. If it's missing, the sending server is misconfigured.
Next, verify In-Reply-To. The reply should contain an In-Reply-To header that matches the parent's Message-ID character for character. A common bug in programmatic senders is stripping the angle brackets or injecting extra whitespace around the value.
Then inspect References. The header should contain a growing list of Message-IDs. If it's empty or only contains the most recent parent, threading works for direct replies but breaks for deeper conversation trees.
Compare subject lines too. If headers look correct but Gmail still breaks the thread, the subject probably changed somewhere. Gmail's subject-based grouping can override header-based logic in both directions: a changed subject splits a thread, a matching subject can merge unrelated messages.
Finally, check for client-specific headers. If threading works in Gmail but breaks in Outlook, look for missing Thread-Index or Thread-Topic headers. Outlook uses these for its own threading logic, and messages from non-Exchange systems won't include them by default.
A quick way to pull the relevant headers from a raw email file:
grep -E '^(Message-ID|In-Reply-To|References|Thread-Index|Subject):' raw-email.eml
Generating unique Message-IDs at scale#
If you're sending thousands of emails per day, Message-ID uniqueness becomes a real engineering concern. The standard format is a local part, an @ symbol, and your sending domain. Common approaches:
- UUID v4 + domain:
550e8400-e29b-41d4-a716-446655440000@mail.yourapp.com - Timestamp + random bytes + domain:
1712188800.a7f3b2c1d4@mail.yourapp.com - Sequential counter + process ID + domain (works with a single sender process, risky otherwise)
If two messages accidentally share a Message-ID (from a bug, clock skew, or weak randomness), different clients handle it differently. Some merge the messages, displaying one and hiding the other. Some refuse to download the "duplicate." There's no recovery path once it happens. Prevention through strong random components is the only real strategy.
Threading is one of those things that works invisibly when set up right and fails loudly when it's not. Store your Message-IDs, set In-Reply-To and References on every reply, keep subject lines consistent, and your threads will hold together across every major email client. If your agent needs its own inbox with threading headers handled automatically, .
Frequently asked questions
What is the Message-ID header and who generates it?
Message-ID is a globally unique identifier assigned to every outgoing email by the sending mail server. It follows the format unique-string@sender-domain and is used by receiving clients to identify and reference individual messages within a thread.
What is the difference between In-Reply-To and References in practice?
In-Reply-To contains a single Message-ID pointing to the direct parent message. References contains a space-separated list of all Message-IDs in the thread's ancestry, from the first message through the most recent. Clients use In-Reply-To for parent-child links and References to rebuild the full tree.
Does every outgoing email need a unique Message-ID?
Yes. Every email must have a globally unique Message-ID. Duplicates cause unpredictable behavior: some clients merge messages, others silently hide the second one. Combine a timestamp, random bytes, and your sending domain to guarantee uniqueness.
How does Gmail decide which emails belong in the same thread?
Gmail threads primarily by subject line. If two emails share the same subject after stripping Re: and Fwd: prefixes, Gmail groups them together, even without matching In-Reply-To or References headers. This differs from Outlook and Apple Mail, which rely on headers first.
Why does my reply appear as a new conversation in Outlook?
Outlook relies on In-Reply-To, References, and its proprietary Thread-Index header to group threads. If your sending system doesn't set these headers, Outlook can't associate the reply with the original thread. Check that In-Reply-To exactly matches the parent's Message-ID.
Is there a size limit on the References header?
RFC 5322 doesn't set a hard limit. Some mail servers truncate References if it exceeds a few kilobytes, but most threads never reach that length. If your conversations regularly exceed 50+ messages, you can trim older entries while keeping the first and last Message-IDs.
Can I set a custom Message-ID when sending via an email API?
Most transactional email APIs allow you to set a custom Message-ID. If you don't provide one, the API generates one automatically. If you set your own, ensure it's globally unique and use your actual sending domain in the domain portion.
What fallback do email clients use when threading headers are missing?
Most clients fall back to subject line matching when In-Reply-To and References are absent. Outlook also checks Thread-Topic and Thread-Index as intermediate signals. Apple Mail uses subject matching only as a last resort when no headers exist at all.
How do I thread AI agent emails back into a human conversation?
Store the Message-ID from each received email. When your agent replies, set In-Reply-To to the parent's Message-ID and build References by appending the parent's Message-ID to the parent's own References list. Also prepend Re: to the subject line for Gmail compatibility.
What happens if two emails have the same Message-ID?
Clients handle collisions inconsistently. Some merge the messages and only show one. Some skip the "duplicate" entirely during sync. There's no graceful recovery, so prevention through strong randomness in your ID generation is the only reliable strategy.
What is the Thread-Topic header and when is it used?
Thread-Topic is a proprietary header used by Microsoft Outlook and Exchange. It contains the conversation subject and helps Outlook group messages internally. Non-Exchange sending systems don't typically include it, which is one reason threads sometimes break between Outlook and other clients.
Why does a thread work in Gmail but break in Outlook?
Gmail threads by subject line, while Outlook threads by In-Reply-To, References, and Thread-Index. A message with a matching subject but missing In-Reply-To will group correctly in Gmail yet appear as a new conversation in Outlook. Set all headers for cross-client reliability.
What RFC standards govern email threading headers?
RFC 5322 (which superseded RFC 2822) defines Message-ID, In-Reply-To, and References. Both RFCs describe the same header format, uniqueness requirements, and intended threading behavior. RFC 5322 is the current standard.
How do I diagnose threading failures in a programmatic email pipeline?
Check five things in order: Message-ID presence on every message, In-Reply-To matching the parent's Message-ID exactly, References containing the full ancestor chain, subject line consistency across replies, and Thread-Index presence if Outlook compatibility matters.


