Question 1

How is RAG different from fine-tuning?

Accepted Answer

RAG retrieves external information at query time and includes it in the prompt. Fine-tuning permanently modifies the model's weights with new training data. RAG is better for knowledge that changes frequently, while fine-tuning is better for teaching the model new behaviors or styles. Many production systems use both together.

Question 2

What kind of knowledge base works best with RAG?

Accepted Answer

RAG works well with any text-based content: documentation, FAQs, support tickets, email archives, product specs, and policy documents. The content should be chunked into manageable pieces (typically 200-500 tokens) and indexed in a vector database for fast similarity search.

Question 3

Can AI agents use RAG to handle email?

Accepted Answer

Yes. An agent can use RAG to retrieve relevant context (customer history, product docs, past conversations) when processing incoming emails. This allows the agent to generate accurate, personalized replies instead of generic responses, significantly improving the quality of automated email communication.

Question 4

What is a vector database and why does RAG need one?

Accepted Answer

A vector database stores documents as numerical embeddings (vectors) and enables fast similarity search. RAG needs one to quickly find documents that are semantically related to the user's query. Popular options include Pinecone, Weaviate, ChromaDB, and pgvector for PostgreSQL.

Question 5

How does RAG reduce hallucination in AI agents?

Accepted Answer

RAG provides the model with source documents to reference when generating responses. When the model has relevant facts in its context window, it is far less likely to fabricate information. For email agents handling customer inquiries, this means responses are grounded in actual product data and policies rather than guesswork.

Question 6

What is chunking in RAG and why does it matter?

Accepted Answer

Chunking is the process of splitting documents into smaller pieces for indexing. Chunk size affects retrieval quality — too large and irrelevant content gets included, too small and context is lost. For email knowledge bases, chunks of 200-500 tokens with overlap typically work well.

Question 7

Can RAG work with email archives as a knowledge source?

Accepted Answer

Yes. Past email conversations, support tickets, and thread histories make excellent RAG sources. An agent can search previous interactions with a customer before replying, ensuring consistency and avoiding asking for information already provided in earlier messages.

Question 8

What is the difference between RAG and long context windows?

Accepted Answer

Long context windows let you pass more information to the model at once, but they are expensive and have diminishing returns as context grows. RAG selectively retrieves only the most relevant documents, keeping prompts focused and cost-effective. RAG scales to millions of documents; context windows cannot.

Question 9

How do you evaluate RAG quality for an email agent?

Accepted Answer

Measure retrieval precision (are the retrieved documents relevant?), answer accuracy (does the response match the source material?), and hallucination rate (does the agent fabricate information not in the sources?). Test with real email queries from your domain and compare RAG-augmented responses against ground truth answers.

Question 10

Does RAG add latency to agent responses?

Accepted Answer

Yes, the retrieval step adds latency — typically 50-200ms for a vector search. For email agents, this is usually negligible since email response time expectations are measured in minutes, not milliseconds. The accuracy improvement from RAG far outweighs the small latency cost for most email workflows.

RAG (Retrieval-Augmented Generation)

What is RAG?#

Why It Matters for AI Agents#

Frequently asked questions

Related terms

Embeddings

Vector Database

Context Window