Launch-Free 3 months Builder plan-
Pixel art lobster working at a computer terminal with email — llama index functioncallingagent email

how to build a llama index functioncallingagent that sends real email

Move beyond mock send_message functions. Build a LlamaIndex FunctionAgent that sends real emails using proper infrastructure.

8 min read
Samuel Chenard
Samuel ChenardCo-founder

Every LlamaIndex email tutorial has the same problem. You define a send_message function, wire it up to a FunctionAgent, and the agent "sends" an email that goes absolutely nowhere. The function returns a hardcoded string like "Successfully sent mail to user@example.com" and you move on, pretending that counts as email.

It doesn't. Your agent needs an actual mailbox. It needs to receive replies, handle bounces, and not get blacklisted after 50 messages. The gap between a mock function and production email is enormous, and most tutorials skip it entirely.

This guide covers how to build a LlamaIndex FunctionAgent (formerly FunctionCallingAgent) that sends and receives real email, with actual infrastructure behind it.

How to build a LlamaIndex FunctionCallingAgent for email (step-by-step)#

A LlamaIndex FunctionAgent (previously called FunctionCallingAgent) is an agent class that uses an LLM's native tool-calling API to select and invoke Python functions. Here's how to build one that handles email:

  1. Install LlamaIndex and your email SDK (pip install llama-index-core llama-index-llms-openai).
  2. Define your email functions with type-annotated parameters.
  3. Wrap each function with FunctionTool.from_defaults().
  4. Instantiate FunctionAgent with the tool list and your LLM.
  5. Call await agent.run() with a natural-language prompt.
  6. Inspect the AgentOutput for the sent confirmation.
  7. Add error handling, rate limits, and logging for production use.

Let's walk through each piece.

Defining email tools that actually work#

The typical tutorial gives you something like this:

async def send_message(to: str, content: str) -> str:
    """Dummy function to simulate sending an email."""
    return f"Successfully sent mail to {to}"

That's a starting point, not a solution. A real email tool needs to connect to infrastructure that handles DNS authentication, bounce processing, and delivery tracking. Here's what a production-ready tool function looks like when backed by real email infrastructure:

import httpx

EMAIL_API_URL = "https://api.lobstermail.ai/v1"
EMAIL_API_TOKEN = "lm_sk_live_..."

async def send_email(to: str, subject: str, body: str) -> str:
    """Send a real email through the agent's provisioned inbox."""
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            f"{EMAIL_API_URL}/inboxes/my-agent/send",
            headers={"Authorization": f"Bearer {EMAIL_API_TOKEN}"},
            json={"to": to, "subject": subject, "body": body},
        )
        resp.raise_for_status()
        return f"Email sent to {to}, message ID: {resp.json()['id']}"


async def check_inbox() -> str:
    """Check the agent's inbox for new emails."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"{EMAIL_API_URL}/inboxes/my-agent/emails",
            headers={"Authorization": f"Bearer {EMAIL_API_TOKEN}"},
        )
        resp.raise_for_status()
        emails = resp.json()
        if not emails:
            return "No new emails."
        summaries = []
        for e in emails[:5]:
            summaries.append(f"From: {e['from']}, Subject: {e['subject']}")
        return "\n".join(summaries)

The difference is obvious. One talks to a mail server. The other returns a string.

Wiring tools into FunctionAgent#

LlamaIndex renamed FunctionCallingAgent to FunctionAgent in recent versions (the import path moved to llama_index.core.agent.workflow). The setup looks like this:

from llama_index.core.tools import FunctionTool
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

send_tool = FunctionTool.from_defaults(fn=send_email)
inbox_tool = FunctionTool.from_defaults(fn=check_inbox)

llm = OpenAI(model="gpt-4o-mini")

agent = FunctionAgent(
    tools=[send_tool, inbox_tool],
    llm=llm,
    system_prompt=(
        "You are an email assistant. You can send emails and check "
        "the inbox. Always confirm before sending."
    ),
)

Then run it:

response = await agent.run(
    "Check my inbox, then reply to the most recent email with a thank-you note."
)
print(response)

The agent reads the inbox, picks the most recent message, composes a reply, and sends it through real infrastructure. No mocks.

FunctionAgent vs ReActAgent: which one for email?#

LlamaIndex offers two main agent types. FunctionAgent uses the LLM's native tool-calling API (structured JSON function calls). ReActAgent uses a reasoning loop where the LLM generates thought/action/observation steps as text.

For email tasks, FunctionAgent is almost always the better choice. Email operations are discrete, well-defined actions: send this message, check this inbox, forward this thread. They don't require multi-step reasoning chains. FunctionAgent calls tools faster (one LLM call per action instead of the ReAct loop's multiple calls) and produces more predictable results because the LLM returns structured tool-call objects rather than free-text that gets parsed.

ReActAgent makes more sense when your agent needs to reason about ambiguous situations, like deciding whether to escalate a support ticket or classify an email into a custom taxonomy. But for sending and receiving? FunctionAgent wins on speed and reliability.

Which LLMs support function calling?#

Not every model works with FunctionAgent. You need an LLM that exposes a native function-calling API. The current options in LlamaIndex:

  • OpenAI: GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo. Best tool-calling accuracy, especially GPT-4o-mini for cost-sensitive workloads.
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku. Solid function calling through llama_index.llms.anthropic.
  • Mistral: Mistral Large, Mistral Small. Works via llama_index.llms.mistral.
  • Google Gemini: Gemini 1.5 Pro and Flash. Via llama_index.llms.gemini.

GPT-4o-mini hits a good balance for email tools. It's cheap ($0.15 per million input tokens), fast, and rarely hallucinates tool parameters. For complex email workflows where the agent needs to interpret ambiguous instructions, GPT-4o or Claude 3.5 Sonnet give better results.

Passing credentials securely#

Never hardcode API keys in your tool functions. Use environment variables or a secrets manager:

import os

EMAIL_API_TOKEN = os.environ["LOBSTERMAIL_API_TOKEN"]

If you're running multiple agents, each with its own inbox, pass the token as a parameter to the tool or use a factory pattern that binds credentials at initialization. The key principle: credentials should never appear in the LLM's context window. Keep them in the function's closure, not in the system prompt.

Production concerns tutorials skip#

Mock functions hide real problems. Once your agent sends actual email, you need to handle several things that tutorials never mention.

Idempotency. If the LLM retries a tool call (which happens with streaming or timeout recovery), you'll send duplicate emails. Generate a unique idempotency key per agent action and pass it in the request headers. Your email infrastructure should deduplicate on that key.

Rate limiting. An agent in a loop can burn through hundreds of sends in minutes. Set explicit limits in your tool function, not just at the infrastructure level. A simple counter with a ceiling per hour prevents runaway sends.

Error handling. SMTP errors come in flavors. A 550 bounce means the address doesn't exist, and retrying is pointless. A 421 means the server is temporarily busy, and retrying in a few minutes might work. Your tool function should return different messages to the agent based on the error class so it can decide what to do next.

async def send_email(to: str, subject: str, body: str) -> str:
    """Send an email with error classification."""
    try:
        async with httpx.AsyncClient() as client:
            resp = await client.post(
                f"{EMAIL_API_URL}/inboxes/my-agent/send",
                headers={"Authorization": f"Bearer {EMAIL_API_TOKEN}"},
                json={"to": to, "subject": subject, "body": body},
            )
            resp.raise_for_status()
            return f"Sent successfully. ID: {resp.json()['id']}"
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 422:
            return f"Invalid recipient address: {to}. Do not retry."
        if e.response.status_code == 429:
            return "Rate limit hit. Wait 60 seconds before sending again."
        return f"Send failed with status {e.response.status_code}. Retry may help."

Logging. Every email sent by an agent should be logged with a timestamp, recipient, subject line, and the agent's reasoning for sending it. When something goes wrong (and it will), you need an audit trail. Write logs to a structured format, not just stdout.

The inbox problem nobody talks about#

Most LlamaIndex email tutorials focus entirely on sending. But agents that send email also need to receive it. Confirmation codes, replies, bounce notifications. Without an inbox, your agent is shouting into a void with no way to hear back.

Setting up receiving infrastructure is harder than sending. You need DNS records pointing to an inbound mail server, a system to store and index incoming messages, and an API your agent can poll or subscribe to via webhooks. That's a lot of plumbing for what should be a simple capability.

This is where agent-first email infrastructure earns its keep. With LobsterMail, for example, your agent provisions its own inbox with a single SDK call, then polls for incoming mail without you configuring a single DNS record. The inbox handles receiving, stores messages with security metadata (including injection risk scoring), and lets the agent pull new emails whenever it needs them. If you want to skip the infrastructure work and give your LlamaIndex agent a real inbox, and wire the API into your tool functions.

Evaluating your email agent#

How do you know your agent is calling the email tool correctly? The Ragas framework integrates with LlamaIndex for exactly this. You can create test scenarios ("send a follow-up to the last customer email", "check inbox and summarize unread messages") and evaluate whether the agent selected the right tool, passed the right parameters, and handled the response appropriately.

The key metrics to track: tool selection accuracy (did it pick send_email vs check_inbox correctly?), parameter correctness (did it put the right address in the to field?), and end-to-end task completion (did the email actually get delivered?).


Frequently asked questions

What is FunctionCallingAgent (now FunctionAgent) in LlamaIndex and when should I use it?

FunctionAgent is a LlamaIndex agent class that uses an LLM's native tool-calling API to invoke Python functions. Use it when your tasks are well-defined actions like sending email, querying databases, or calling APIs, rather than open-ended reasoning.

How do I define a Python function as an email tool for a LlamaIndex FunctionAgent?

Write a regular Python function with type-annotated parameters and a docstring, then wrap it with FunctionTool.from_defaults(fn=your_function). LlamaIndex extracts the schema from annotations and the description from the docstring.

What LLMs support native function calling with LlamaIndex's FunctionAgent?

OpenAI (GPT-4o, GPT-4o-mini, GPT-3.5-turbo), Anthropic (Claude 3.5 Sonnet, Claude 3 Opus/Haiku), Mistral (Large, Small), and Google Gemini (1.5 Pro, Flash) all support function calling through their respective LlamaIndex LLM integrations.

What is the difference between FunctionAgent and ReActAgent in LlamaIndex?

FunctionAgent uses structured tool-call JSON from the LLM's API, making it faster and more predictable. ReActAgent uses a thought/action/observation text loop, which is better for complex reasoning but slower and more prone to parsing errors.

How do I pass credentials securely into a LlamaIndex email tool?

Use environment variables or a secrets manager. Load the credential inside the tool function via os.environ. Never put API keys in the system prompt or anywhere the LLM can see them.

How can I make a LlamaIndex agent send real emails instead of using a mock function?

Replace the dummy function body with an HTTP call to a real email API. You need actual email infrastructure behind it, whether that's LobsterMail, SendGrid, or your own SMTP server. The tool function's signature stays the same.

How do I prevent a LlamaIndex agent from sending duplicate emails?

Generate a unique idempotency key for each send action (based on the conversation turn or a hash of recipient + content) and pass it in your API request headers. Your email provider should deduplicate on that key.

What rate limits should I enforce when a LlamaIndex agent controls email sending?

Start with 10-20 emails per hour for a new sending identity. Implement a counter in your tool function that rejects sends above the threshold. Increase gradually as your sender reputation builds over 2-4 weeks.

Can a LlamaIndex FunctionAgent send emails to multiple recipients in one run?

Yes. The agent can call the send tool multiple times in a single run. Each call is a separate tool invocation. Add rate limiting in your tool function to prevent the agent from blasting hundreds of messages in a loop.

How do I add memory or conversation history to a LlamaIndex email-sending agent?

Pass a ChatMemoryBuffer to the FunctionAgent constructor. This lets the agent remember previous emails sent and received within the same session, which is useful for multi-turn email workflows like drafting, revising, and sending.

How do I evaluate whether my LlamaIndex FunctionAgent calls the email tool correctly?

Use the Ragas framework with LlamaIndex's integration. Create test scenarios with expected tool calls and parameters, then measure tool selection accuracy and parameter correctness across your test set.

How do I log every email sent by a LlamaIndex FunctionCallingAgent?

Add structured logging inside your tool function that records the timestamp, recipient, subject, message ID, and the agent's reasoning (available from the agent's chat history). Write to a JSON log file or a database for audit trails.

Can I use LlamaIndex FunctionAgent with Anthropic Claude instead of OpenAI?

Yes. Install llama-index-llms-anthropic, instantiate Anthropic(model="claude-3-5-sonnet-20241022"), and pass it as the llm parameter. Claude's tool-calling API works the same way as OpenAI's from FunctionAgent's perspective.

Related posts