Question 1

What is the difference between embeddings and tokens?

Accepted Answer

Tokens are how LLMs break text into processable units — subword pieces that the model reads sequentially. Embeddings are numerical vectors that represent the overall meaning of text. Tokenization is a preprocessing step. Embedding is a transformation that captures semantic content in a fixed-size vector for comparison and retrieval.

Question 2

How do I choose an embedding model?

Accepted Answer

Consider your domain, required accuracy, vector dimensions (larger means more storage but potentially better quality), and cost. For most agent applications, a general-purpose model like OpenAI's text-embedding-3-small offers a good balance. For specialized domains (legal, medical), fine-tuned or domain-specific embedding models may perform better.

Question 3

Can I embed emails for search?

Accepted Answer

Yes. Embedding emails is one of the most practical applications. Embed the subject and body of each email, store the vectors, and you can perform semantic search across your entire inbox history. This lets agents find relevant past conversations even when the user describes them in different words than the original message used.

Question 4

What is a vector database and how does it relate to embeddings?

Accepted Answer

A vector database stores embeddings and enables fast similarity search over them. When you embed your data, you store those vectors in a vector database like Pinecone, Weaviate, or Qdrant. The database uses approximate nearest neighbor algorithms to quickly find the most similar vectors to a query.

Question 5

How do embeddings power RAG for AI agents?

Accepted Answer

In RAG, an agent embeds an incoming query, searches a vector database for the most similar stored embeddings, retrieves the associated documents, and includes them in the LLM prompt. This grounds the agent's response in real information rather than relying on the model's training data alone.

Question 6

What are embedding dimensions and why do they matter?

Accepted Answer

Embedding dimensions are the number of floats in the output vector, typically ranging from 384 to 3,072. Higher dimensions can capture more nuance but require more storage and compute for similarity search. Lower dimensions are faster and cheaper but may lose subtle distinctions between similar texts.

Question 7

Do embeddings work across languages?

Accepted Answer

Many modern embedding models are multilingual and can map similar meanings across languages to nearby vectors. This means a question asked in English can match a document written in Spanish if the embedding model supports both languages. Check your model's documentation for supported language coverage.

Question 8

How often should I re-embed my data?

Accepted Answer

Re-embed when you switch to a different embedding model or when your source data changes significantly. Embeddings from different models are not compatible with each other. If your knowledge base is updated frequently, set up a pipeline that embeds new or changed documents automatically.

Question 9

What is cosine similarity in the context of embeddings?

Accepted Answer

Cosine similarity measures the angle between two embedding vectors, producing a score between -1 and 1. A score close to 1 means the texts are semantically similar. It is the most common metric for comparing embeddings because it works well regardless of vector magnitude.

Question 10

Can embeddings be used to detect duplicate emails?

Accepted Answer

Yes. By embedding incoming emails and comparing them against recent messages, an agent can identify near-duplicate content even when the wording differs. This is useful for deduplicating support tickets, detecting repeated complaints, or flagging spam that uses slight text variations.

Embeddings

What are embeddings?#

Why it matters for AI agents#

Frequently asked questions

Related terms

Vector Database

RAG (Retrieval-Augmented Generation)

Tokens