Question 1

When should I fine-tune instead of using prompt engineering?

Accepted Answer

Fine-tune when you need consistent behavior that is hard to achieve with prompts alone, when you want to reduce prompt length and cost at high volumes, or when the model needs deep familiarity with domain-specific language. For most use cases, start with prompt engineering and only fine-tune if you hit quality or cost limits.

Question 2

How much data do I need to fine-tune a model?

Accepted Answer

It depends on the task, but meaningful improvements can be achieved with as few as 100 to 500 high-quality examples. More complex behavioral changes may require thousands of examples. Quality matters far more than quantity. A small dataset of carefully curated examples outperforms a large noisy one.

Question 3

Can I fine-tune a model for email-specific tasks?

Accepted Answer

Yes. Common fine-tuning targets for email agents include tone and style consistency, format compliance (like always including specific sections), domain terminology, and classification tasks (routing emails to the right handler). Combine fine-tuning with RAG so the model has its style baked in but always references current information.

Question 4

What is the difference between fine-tuning and RAG?

Accepted Answer

Fine-tuning modifies the model's weights to change its default behavior and style. RAG provides external information at inference time without changing the model. Fine-tuning is best for consistent tone, format, and style. RAG is best for up-to-date factual content. Many production systems use both together.

Question 5

What is LoRA and how does it make fine-tuning cheaper?

Accepted Answer

LoRA (Low-Rank Adaptation) is a technique that freezes most of the model's parameters and only trains small adapter matrices. This dramatically reduces compute and memory requirements, making fine-tuning feasible on consumer hardware and at a fraction of the cost of full fine-tuning.

Question 6

Does fine-tuning affect the model's general capabilities?

Accepted Answer

Yes, fine-tuning can cause catastrophic forgetting, where the model loses some general knowledge by overfitting to the fine-tuning dataset. Using techniques like LoRA, keeping the dataset diverse, and training for fewer epochs helps preserve the model's broad capabilities while adding specialized behavior.

Question 7

How do I evaluate whether fine-tuning improved my model?

Accepted Answer

Hold out a test set that the model never sees during training and compare its performance before and after fine-tuning on that set. Measure task-specific metrics like classification accuracy, format compliance rate, or human preference scores rather than relying on generic benchmarks.

Question 8

Can I fine-tune open-source models for my AI agent?

Accepted Answer

Yes. Open-source models like Llama, Mistral, and Qwen can be fine-tuned locally or on cloud GPU instances. This gives you full control over the model, avoids per-call API costs at inference time, and lets you keep sensitive training data private.

Question 9

How often should I re-fine-tune my model?

Accepted Answer

Re-fine-tune when your agent's performance degrades, when your product or policies change significantly, or when you accumulate enough new high-quality training examples to justify a training run. For most teams, quarterly or semi-annual re-tuning strikes a good balance between freshness and effort.

Question 10

What are the risks of fine-tuning with low-quality data?

Accepted Answer

Low-quality training data can teach the model bad habits: incorrect information, inconsistent formatting, inappropriate tone, or biased outputs. The model learns whatever patterns exist in the data, so errors and inconsistencies in training examples get amplified in production behavior.

Fine-Tuning

What is Fine-Tuning?#

Why It Matters for AI Agents#

Frequently asked questions

Related terms

Inference

RAG (Retrieval-Augmented Generation)

LoRA