
how to build a langgraph stateful email agent workflow
Build a stateful email agent workflow in LangGraph with classification, conditional routing, human review, and real inbox infrastructure.
LangGraph gives you something LangChain alone can't: a stateful email agent workflow where each step knows what happened before it. Your agent reads an email, classifies it, drafts a reply, waits for human approval, and sends. State flows through the entire graph.
That's the theory, at least. Most LangGraph email tutorials stop at the orchestration layer. They show you how to wire up nodes and edges but hand-wave the actual email infrastructure. You end up with a clean graph that can't touch a real inbox without bolting on IMAP and SMTP plumbing yourself.
This guide covers both sides. We'll walk through building a stateful email workflow in LangGraph, then address the infrastructure gap that most tutorials ignore. If you'd rather skip the email plumbing, LobsterMail lets your agent provision its own inbox with a single function call. and focus on the graph logic instead.
What is a stateful email agent in LangGraph?#
LangGraph is a framework from the LangChain team for building AI agent workflows as directed graphs. Unlike a plain LangChain chain (which runs start to finish in a straight line), LangGraph lets you define nodes, edges, and conditional branching. The "stateful" part means each node can read and modify a shared state object that travels through the graph.
For email automation, this means your agent can receive a message, store it in state, classify the email type, route to different processing nodes based on that classification, draft a response using the full thread history, and pause for human review before sending. A StateGraph in LangGraph is the container that holds all of this logic together. You define the state shape, add nodes as functions, connect them with edges, compile the graph, and invoke it.
How to build a LangGraph stateful email agent#
Here's the process from zero to a working workflow:
- Define your
EmailStateusing aTypedDictwith fields for messages, classification, and draft - Create processing nodes for classification, summarization, and response drafting
- Add conditional edges that route emails by type (support, sales, spam)
- Attach a human-review interrupt node for sensitive replies
- Wire up email infrastructure so the agent can actually receive and send
- Compile with
workflow.compile()and invoke with your initial state - Add checkpointing to persist state across sessions
Let me walk through the important pieces.
Defining EmailState#
Your state object is the backbone of the whole workflow. Every node reads from it and writes to it.
from typing import TypedDict, List, Optional
class EmailState(TypedDict):
sender: str
subject: str
body: str
thread_history: List[dict]
classification: Optional[str]
draft_reply: Optional[str]
human_approved: bool
iteration: int
Tip
LangGraph includes a built-in MessagesState for conversational workflows. For email, a custom TypedDict is usually better because you get explicit fields for classification, approval status, and thread history. That explicitness pays off when debugging.
Building processing nodes#
Each node is a Python function that takes the current state and returns a partial state update. Here's a classification node and a reply-drafting node:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
def classify_email(state: EmailState) -> dict:
prompt = f"Classify this email as 'support', 'sales', or 'spam':\n\n{state['body']}"
result = llm.invoke(prompt)
return {"classification": result.content.strip().lower()}
def draft_reply(state: EmailState) -> dict:
context = "\n".join(
[f"From: {m['from']}\n{m['body']}" for m in state["thread_history"]]
)
prompt = (
f"Draft a reply to this {state['classification']} email.\n\n"
f"Thread:\n{context}\n\nLatest message:\n{state['body']}"
)
result = llm.invoke(prompt)
return {"draft_reply": result.content, "iteration": state["iteration"] + 1}
The draft_reply function pulls thread_history from state, which means it has context from every previous node. No magic, no hidden memory store. Whatever you put in the TypedDict is what every downstream node sees.
Conditional routing#
This is where LangGraph diverges from a linear LangChain chain. Instead of running every step in order, you define routing functions that inspect state and decide which node executes next.
from langgraph.graph import StateGraph, END
def route_by_classification(state: EmailState) -> str:
if state["classification"] == "spam":
return "discard"
if state["classification"] == "sales":
return "sales_handler"
return "support_handler"
workflow = StateGraph(EmailState)
workflow.add_node("classify", classify_email)
workflow.add_node("support_handler", draft_reply)
workflow.add_node("sales_handler", draft_reply)
workflow.add_node("discard", lambda s: {})
workflow.set_entry_point("classify")
workflow.add_conditional_edges("classify", route_by_classification)
workflow.add_edge("support_handler", END)
workflow.add_edge("sales_handler", END)
workflow.add_edge("discard", END)
graph = workflow.compile()
The add_conditional_edges call does the heavy lifting. After classification, the graph calls route_by_classification, inspects the return value, and sends execution down the matching branch. Spam gets discarded with no LLM call. Support and sales each follow their own handling path.
Human-in-the-loop review#
LangGraph supports interrupt nodes where execution pauses and waits for external input. For email, this is useful when you don't want the agent auto-sending replies to anything sensitive.
from langgraph.checkpoint.memory import MemorySaver
def human_review(state: EmailState) -> dict:
return {"human_approved": True}
workflow.add_node("review", human_review)
workflow.add_edge("support_handler", "review")
checkpointer = MemorySaver()
graph = workflow.compile(checkpointer=checkpointer, interrupt_before=["review"])
With interrupt_before=["review"], the graph pauses before executing the review node. Your application displays the draft to a human, collects their decision, and resumes execution. The MemorySaver checkpointer persists state in memory during the pause. Swap it for a PostgreSQL-backed checkpointer when you go to production.
Where LangGraph stops and real email begins#
You now have a graph that classifies, routes, drafts, and reviews. But it can't actually touch a real inbox.
To connect this to Gmail, you'd need OAuth 2.0 credentials, a consent screen, token refresh logic, and IMAP polling. For Outlook, it's the Microsoft Graph API with similar OAuth complexity. Either way, you're writing hundreds of lines of infrastructure code that has nothing to do with your agent's actual logic. We covered this same friction in langchain email integration without the OAuth headache. The short version: building the graph is the fun part, and email plumbing is where the momentum dies.
LobsterMail fills this gap. Your agent pinches its own inbox with one SDK call, receives email through polling or webhooks, and sends replies through a clean API. No OAuth, no IMAP, no credential management. Connect LobsterMail as the entry and exit nodes in your graph, and the middle stays pure agent logic.
from lobstermail import LobsterMail
async def receive_email_node(state: EmailState) -> dict:
lm = await LobsterMail.create()
inbox = await lm.create_smart_inbox(name="support-agent")
emails = await inbox.receive(limit=1)
if emails:
email = emails[0]
return {
"sender": email.sender,
"subject": email.subject,
"body": email.text,
"thread_history": [],
"iteration": 0,
"human_approved": False,
}
return state
async def send_reply_node(state: EmailState) -> dict:
if state["human_approved"] and state["draft_reply"]:
lm = await LobsterMail.create()
inbox = lm.get_inbox("support-agent")
await inbox.send(
to=state["sender"],
subject=f"Re: {state['subject']}",
body=state["draft_reply"],
)
return state
Plug these in as the first and last nodes. The classification, routing, drafting, and review nodes in the middle don't change at all.
Shipping to production#
MemorySaver works during development but won't survive a restart. LangGraph supports SQLite and PostgreSQL checkpointers out of the box, so switch to one of those before deploying. Your agent needs to resume long-running email threads across sessions rather than losing context every time the process recycles.
If you're processing hundreds of emails per hour, you'll hit LLM rate limits before you hit email rate limits. Batch your classification calls, cache common spam patterns with a regex pre-filter, and use a cheaper model (GPT-4o-mini works well) for triage. Save the bigger models for reply drafting where quality actually matters.
Turn on LangSmith tracing early. Stateful workflows have more failure modes than linear chains, and most bugs hide in unexpected state mutations between nodes. Being able to see the exact state at each step saves hours of print-statement debugging.
Watch your thread history size. If you're stuffing every message into thread_history, you'll blow past LLM context windows within a few exchanges. Summarize older messages and keep only the last three to five in full. Your agent's replies will be better for it because it's working from a clean signal rather than 40 pages of quoted replies.
Start with the infrastructure. Get your agent a working inbox first, then build the LangGraph workflow around it. Debugging classification logic is a lot more fun when you're not also debugging SMTP timeouts.
Frequently asked questions
What makes LangGraph better than a plain LangChain chain for email automation?
LangChain chains run in a straight line from start to finish. LangGraph adds loops, conditional branching, and persistent state, which means your agent can classify an email, route it to different handlers, revise a draft, and wait for human approval. Those patterns are impossible in a linear chain.
What is the minimal node structure needed for a working LangGraph email agent?
Two nodes and an entry point. One node to classify the incoming email, one to draft a reply, connected by an edge. Add workflow.set_entry_point(), compile, and invoke. You can layer in routing, review, and sending nodes from there.
What is MessagesState and when should I use it instead of a custom EmailState TypedDict?
MessagesState is a built-in LangGraph state class that tracks a list of conversation messages. It works well for chatbot-style workflows. For email agents, a custom TypedDict with explicit fields for classification, thread history, and approval status gives you more control and easier debugging.
Can I pause a LangGraph workflow and wait for a human to approve a draft reply?
Yes. Use interrupt_before or interrupt_after when compiling your graph. Execution pauses at the specified node, your app surfaces the draft for review, and you resume the graph after the human decides. This requires a checkpointer to save state while waiting.
How do I maintain email thread history across multiple LangGraph invocations?
Store thread history in your state TypedDict and use a persistent checkpointer like SQLite or PostgreSQL. Each invocation picks up where the last one left off. Summarize older messages to avoid exceeding LLM context windows on long threads.
How do I connect a LangGraph email agent to a real inbox like Gmail or Outlook?
Gmail requires OAuth 2.0 setup, a consent screen, token refresh logic, and IMAP polling. Outlook needs the Microsoft Graph API with similar complexity. LobsterMail is a simpler path: your agent provisions its own inbox with one SDK call and receives mail via webhooks or polling.
What are the most common bugs when building stateful email workflows in LangGraph?
Missing Optional annotations on state fields that start as None, forgetting to increment iteration counters, unbounded thread history that overflows context windows, and conditional edge functions that don't cover all possible classification values. LangSmith tracing catches most of these quickly.
How does LangGraph email automation compare to n8n or Zapier for intelligent routing?
n8n and Zapier are great for rule-based email routing (if subject contains X, forward to Y). LangGraph gives you LLM-powered classification and conditional branching with full state persistence. Use n8n or Zapier for simple workflows, LangGraph when you need an agent that reasons about content.
Can a LangGraph email agent handle attachments, forwarding, and CC/BCC?
Yes, but you need to add those fields to your state TypedDict and handle them in your processing nodes. Your email infrastructure also needs to support them. LobsterMail's SDK includes attachment support, CC/BCC fields, and forwarding through the API.
Which LLM performs best for email response generation in LangGraph?
GPT-4o and Claude both produce strong email replies. For classification and spam detection, GPT-4o-mini or Claude Haiku are fast and cheap enough to run on every incoming message. Reserve the bigger models for reply drafting where tone and accuracy matter most.
What does workflow.compile() actually do under the hood?
It validates your graph structure (checks for unreachable nodes, missing edges), connects the checkpointer if one is provided, and returns a runnable CompiledGraph object. You invoke this object with your initial state dictionary to execute the workflow.
How do I use LangSmith to debug a stateful LangGraph email workflow?
Set your LANGCHAIN_API_KEY and LANGCHAIN_TRACING_V2=true environment variables. LangSmith logs every node invocation, the full state before and after each step, edge routing decisions, and individual LLM calls. Filter by run ID to trace a single email through the entire graph.
How do I scale a LangGraph email agent to hundreds of emails per hour?
Batch classification calls, use a fast model for triage, cache common patterns with regex pre-filters, and run multiple graph instances in parallel. The bottleneck is usually LLM rate limits, not the graph framework. Use LangGraph's async support and a database-backed checkpointer to handle concurrent executions.


