Question 1

How much training data does LoRA need?

Accepted Answer

For domain adaptation (teaching the model a new style or vocabulary), a few hundred to a few thousand examples often suffice. For teaching new tasks or behaviors, you may need 1,000-10,000 examples. Quality matters more than quantity — well-curated examples produce better adapters than large noisy datasets. For email agents, 500 examples of correctly handled emails is a reasonable starting point.

Question 2

Can I use LoRA with closed-source models like GPT-4 or Claude?

Accepted Answer

No. LoRA requires access to model weights, which closed-source providers don't expose. For API-based models, you use prompt engineering and few-shot examples instead of LoRA. LoRA is primarily used with open-source models like Llama, Mistral, or Qwen that you can download and run locally.

Question 3

What is the difference between LoRA and full fine-tuning?

Accepted Answer

Full fine-tuning updates all of the model's parameters, requiring massive GPU resources and risking catastrophic forgetting (where the model loses general capabilities). LoRA freezes the original weights and trains only small adapter matrices, using a fraction of the memory and time. The quality difference is usually small for focused tasks, making LoRA the default choice for most fine-tuning use cases.

Question 4

What is QLoRA?

Accepted Answer

QLoRA combines quantization with LoRA, applying LoRA adapters to a quantized (4-bit) base model. This lets you fine-tune models with 70B+ parameters on a single consumer GPU with 24 GB of VRAM, making large model customization accessible without enterprise hardware.

Question 5

Can you stack multiple LoRA adapters on one model?

Accepted Answer

Yes. Multiple LoRA adapters can be loaded and swapped at inference time without reloading the base model. This is useful for multi-tenant email platforms where each customer gets a personalized adapter for tone and style, all running on the same base model.

Question 6

How long does LoRA fine-tuning take?

Accepted Answer

LoRA fine-tuning typically takes 30 minutes to a few hours on a single GPU, depending on dataset size and model parameters. This is dramatically faster than full fine-tuning, which can take days or weeks. For email-specific tasks, a 7B model with 1,000 training examples usually completes in under an hour.

Question 7

What is the rank parameter in LoRA?

Accepted Answer

The rank (r) controls the size of the adapter matrices. Higher rank means more trainable parameters and more capacity to learn, but also more memory and compute. Typical values are 8-64. For most email agent tasks, rank 16-32 provides a good balance between quality and efficiency.

Question 8

How can LoRA improve email agents specifically?

Accepted Answer

LoRA can teach an email agent domain-specific vocabulary, preferred response formats, company tone of voice, and task-specific behaviors like extracting order numbers or routing support tickets. A LoRA-adapted model produces more consistent, on-brand email responses than prompt engineering alone.

Question 9

Does LoRA work with quantized models?

Accepted Answer

Yes, and this combination (QLoRA) is one of the most popular approaches. You quantize the base model to 4-bit to save memory, then train LoRA adapters in higher precision on top. The result is a fine-tuned model that fits on consumer hardware with minimal quality loss.

Question 10

What tools are commonly used for LoRA fine-tuning?

Accepted Answer

Popular tools include Hugging Face PEFT (the reference implementation), Unsloth (optimized for speed), Axolotl (simplified configuration), and LLaMA-Factory. Most support QLoRA out of the box and integrate with common training datasets and evaluation frameworks.

LoRA

What is LoRA?#

Why it matters for AI agents#

Frequently asked questions

Related terms

Fine-Tuning

Quantization