Resources

Tools, videos, and articles for successful AI guardrails and governance implementation

Featured

More Resources

PlaceboBench: An LLM Hallucination Benchmark for Pharma
Article·2026-04-15·15 min read

PlaceboBench: An LLM Hallucination Benchmark for Pharma

We created PlaceboBench, a challenging pharmaceutical RAG benchmark based on real clinical questions and official EMA documents. Twelve state-of-the-art LLMs show hallucination rates between 24% and 64%.

Read More
Four ways to reduce hallucinations in your AI agents
Video·2026-03-25·3:13

Four ways to reduce hallucinations in your AI agents

Sharing 4 approaches to reducing hallucinations, backed by quantitative evidence.

Watch Now
PlaceboBench findings: Which LLM hallucinates most on Pharma data?
Video·2026-03-24·1:59

PlaceboBench findings: Which LLM hallucinates most on Pharma data?

We ran 12 of the latest LLMs against a challenging use case in Pharma. Watch to find out how often models hallucinated, why the model with the lowest hallucination rate might not be the best option, and how latency and cost should inform your model selection.

Watch Now
Building Hallucination Resistant GenAI Applications in Pharma
Video·2026-03-19·29:23

Building Hallucination Resistant GenAI Applications in Pharma

We tested 12 state of the art LLMs like GPT-5.4 or Gemini 3.1 Pro on a challenging use case in Pharma. This resulted in PlaceboBench, a benchmark on hallucination rates in Pharma. This on-demand session walks through detailed results and strategies to mitigate hallucinations.

Watch Now
What does it cost to do OCR with Large Language Models?
Article·2025-12-09·5 min read

What does it cost to do OCR with Large Language Models?

Our LLM OCR Cost Calculator allows AI builders to compare PDF parsing costs across different LLM providers and models

Read More
OCR Cost Calculator
Tool·Updated December 2025

OCR Cost Calculator

Compare OCR processing costs across major LLM providers. Find the most cost-effective solution for your PDF parsing job.

Try it now
How to create your own hallucination detection benchmarks: Making of the RAGTruth++ dataset
Video·2025-11-21·7:00

How to create your own hallucination detection benchmarks: Making of the RAGTruth++ dataset

We walk you through the process of creating the RAGTruth++ hallucination benchmark. Useful for anyone who wants to get a deeper understanding of how benchmarks are made.

Watch Now
RAGTruth++: Enhanced Hallucination Detection Benchmark
Article·2025-11-20·12 min read

RAGTruth++: Enhanced Hallucination Detection Benchmark

We enhanced the RAGTruth benchmark by finding 10x more hallucinations through automated detection and human review.

Read More
Why Hallucination Benchmarks Miss the Mark
Article·2025-11-12·8 min read

Why Hallucination Benchmarks Miss the Mark

The gap between hallucination benchmarks and production reality and what bears have to do with it

Read More
AI Hallucinations - Reverse engineering Azure Groundedness and building your own hallucination detector
Video·2025-11-12·15:00

AI Hallucinations - Reverse engineering Azure Groundedness and building your own hallucination detector

Learn how to reverse engineer Azure's Groundedness detection system and build your own custom hallucination detector for AI applications.

Watch Now

Create reliable AI agents

We are AI quality specialists who help engineering teams improve the reliability and accuracy of their AI applications through systematic hallucination detection and mitigation.

Copyright © 2026 Blue Guardrails