Question 1

What is the difference between guardrails and prompt engineering?

Accepted Answer

Prompt engineering shapes how the model behaves through instructions. Guardrails are external checks that validate behavior regardless of what the prompt says. A well-engineered prompt tells the agent to be helpful. Guardrails catch it when it tries to do something it shouldn't, even if the prompt was manipulated or the model hallucinated.

Question 2

Can guardrails prevent all AI failures?

Accepted Answer

No. Guardrails reduce risk but can't eliminate it entirely. A sufficiently creative prompt injection might bypass input filters. A subtle hallucination might pass output validation. Guardrails are one layer in a defense-in-depth strategy that also includes monitoring, logging, human review, and incident response plans.

Question 3

How do guardrails affect agent performance?

Accepted Answer

Guardrails add latency and cost. Input validation runs before inference, output scanning runs after, and action restrictions require additional checks during tool use. For email agents processing high volumes, guardrails need to be fast. Rule-based checks (regex, allowlists) add milliseconds. Model-based checks (running a second LLM) add seconds and double inference costs for checked messages.

Question 4

What guardrails should an email-sending AI agent have?

Accepted Answer

At minimum: domain allowlists restricting who the agent can email, rate limits on sends per hour, content scanning for PII leakage, template enforcement for outbound messages, and human approval gates for emails to new contacts or sensitive threads. The specific set depends on your risk tolerance.

Question 5

What are input guardrails vs output guardrails?

Accepted Answer

Input guardrails filter what goes into the model, blocking prompt injections, stripping sensitive data, and validating request formats. Output guardrails check what comes out, scanning for hallucinations, PII, policy violations, and toxic content before the response reaches users. Both layers are needed for defense in depth.

Question 6

How do you implement guardrails without slowing down the agent?

Accepted Answer

Use fast rule-based checks (regex, allowlists, format validation) for the majority of validations. Reserve expensive model-based checks for high-risk actions like sending external emails or accessing sensitive data. Async logging and monitoring can run in the background without blocking the agent's main workflow.

Question 7

What is a model-based guardrail?

Accepted Answer

A model-based guardrail uses a second LLM to evaluate the primary agent's output. For example, a classifier model might check whether a response contains hallucinated information, violates content policies, or leaks confidential data. These are more flexible than rule-based checks but add latency and cost.

Question 8

How do guardrails protect against prompt injection?

Accepted Answer

Input guardrails scan incoming content for patterns that look like prompt injection attempts, such as instructions embedded in email bodies or user messages. They can strip suspicious content, flag it for review, or reject the request entirely. No single technique catches all injections, so layered detection is important.

Question 9

Should guardrails be configurable per agent or global?

Accepted Answer

Both. Global guardrails enforce organization-wide policies like PII protection and rate limits. Per-agent guardrails let you tune restrictions based on each agent's role and risk level. A customer support agent might have stricter content rules than an internal summarization agent.

Question 10

How do you test whether your guardrails are working?

Accepted Answer

Run adversarial testing with known-bad inputs: prompt injection attempts, PII-containing content, out-of-scope requests, and edge cases. Track how often each guardrail fires in production to identify gaps. Red-team your agent regularly to find bypasses before they are exploited.

Guardrails

What is a guardrail?#

Why it matters for AI agents#

Frequently asked questions

Related terms

Prompt Injection

Hallucination

Human-in-the-Loop (HITL)