Blog

How Primer prevents hallucinations

July 17, 2024
This post is archived.
References & links may be out of date

“How does Primer prevent hallucinations?” a government customer asked me. It’s the question that every customer should ask you if you’re building an AI system with retrieval augmented generation (RAG). I’m writing the answer here so everyone can benefit.

RAG is crucial for AI applications because it keeps the system grounded in customer data. For government analysts, it could be top secret data relevant to the mission. AI hallucinations, where the system generates information out of thin air, is a deal-breaker.

So how bad is the problem? Off-the-shelf GPT-4, the current state of the art large language model (LLM), hallucinates between 5% and 10% of the time, depending on the type of data and queries. And open source LLMs are even worse. We know because we built an automated fact-checking pipeline. You can’t prevent or fix what you can’t detect.

Fact-checking generative AI is very similar to fact-checking in traditional journalism. Every claim made in the text must be checked against the available sources. You may be surprised to learn that we use LLMs to detect errors in their own output. But it turns out that fact-checking is well within their capability, as long as the task is carefully framed and broken down into a series of steps. Our RAG-V pipeline catches the vast majority of errors before any generated text reaches the end user.

By the way, the far greater challenge was making our fact-checking pipeline fast enough to be usable. Our first prototype took more than a minute to run! We’ve got that down to 4 seconds and see a path to 1 second. The trick? Parallel computing and inference triage.

Great, so we can catch hallucinations as they’re occurring, but then what? The next step is to correct them. Just like the journalist fact-checker, the LLM author gets another shot on goal. What they get is the user’s query, the retrieved source data, their generated answer, and a report about each detected error. The LLM rewrites the answer to correct those errors and that goes back through the fact-checking pipeline. If it passes all checks, the generated text goes to the human user. And if it doesn’t, we show the user the error. So our RAG system never exposes the user to an unflagged hallucination.

Is this a completely solved problem? Not yet. Today, we correct around 95% of inaccuracies on the first attempt, with most completions verified in just a few seconds.

Best of all would be for the LLM to make no hallucinations in the first place. Better prompts and more powerful LLMs are two levers we’re pulling on that end.

So for all the customers of RAG systems out there, this is the kind of answer you should expect! If a RAG system isn’t reliably flagging–if not outright correcting–all hallucination errors, you should keep shopping. Especially if accurate generative AI is mission critical, as it is for our government customers.

Primer Enterprise

Informed, defensible analysis

Primer Enterprise is a secure AI platform that helps analysts and mission teams across the Intelligence Community, Defense, and Civilian agencies analyze massive volumes of unstructured data. It transforms fragmented reports, proprietary data, and open-source information into structured, traceable insight that supports faster, defensible decision-making.

Learn about Primer Enterprise
Webpage discussing the impact of the global AI chip race on US security in the Pacific, featuring a text summary, an interactive map with numbered locations, and a sidebar with insights and relevant document titles.

Primer Command

Real-time operational clarity

Primer Command is an AI-powered monitoring platform that helps mission teams keep track of narratives, track evolving topics, and detect emerging threats across global news and social media. It provides real-time visibility into the information environment so leaders can understand events as they unfold.

Learn about Primer Command
Dashboard showing social media analytics including trending extractions for people, organizations, locations, hashtags, social highlights, sentiment analysis, social feed posts, and news feed about AI chip security concerns and cyber attacks.

Learn about AI solutions for better, faster decisions

Book a demo