Blog

RAG-V: divide and conquer with factual claims

September 19, 2024
This post is archived.
References & links may be out of date

In our last post, we looked under the hood at how we define factual errors in RAG-Verification. Now, let’s dive into how we detect and correct those errors.

Detecting factual inconsistencies in text is not a new task. Professional journalists call it fact-checking, part of tradecraft that goes back more than a century. RAG-Verification works in a similar way. The first step is to break the text into individual factual claims. Consider this RAG example:

question:

Are UK special forces operating in Ukraine?

retrieved sources:

[1] Leaked US military documents indicate that the UK has deployed as many as 50 special forces to Ukraine. The documents suggest that more than half of the western special forces personnel present in Ukraine between February and March this year may have been British. It is unclear what activities the special forces may have been engaged in or whether the numbers of personnel have been maintained at this level. (11 April 2023)

[2] The FSB alleges it thwarted an attempt by British special forces to facilitate a landing of Ukrainian sabotage troops on the Tendrov Split, a strategic sandbar located in the Black Sea. The agency claims to have apprehended a senior figure within the Ukrainian naval special forces, providing his name and date of birth as proof of the operation’s existence and the SBS’s direct involvement. Britain’s Defence Ministry has yet to respond to these allegations. (12 April 2024)

generated answer:

Leaked US military documents indicated the possible presence of up to 50 UK special forces personnel in Ukraine in 2023.[1] The FSB alleges to have apprehended British special forces during a failed operation, however Britain’s Defence Ministry has not responded.[2]

RAG-Verification begins by converting the generated text into factual claims:

Factual claims citing source [1]:

  • US military documents were leaked with information about UK special forces in Ukraine.
  • The leaked US military documents indicate the possible presence of up to 50 UK special forces personnel in Ukraine in 2023.

Factual claims citing source [2]:

  • The FSB alleges to have apprehended British special forces during a failed operation.
  • Britain’s Defence Ministry has not responded to allegations by the FSB.

For each claim, we construct a fact-checking prompt for a large language model (LLM). The prompt includes the original question, the generated answer, the specific claim, and the cited sources. The model is then tasked with verifying the factual consistency of the claim based on the provided sources, producing a verdict (yes/no) along with an explanation for its decision.

This evaluation phase, which we call RAG-Eval, is inspired by work at the Allen Institute for AI, who pioneered claim generation last year. Since then, LLMs have made significant advancements in speed and accuracy.

From a human fact-checker’s point of view, RAG-Verification may seem inefficient. What starts as a single fact-check on a small piece of generated text expands into four separate tasks—and this is before even reaching the error-correction phase.

While breaking a single fact-checking task into four distinct checks may seem inefficient, it plays to the strengths of large language models. A human would verify the entire response at once, but LLMs perform better when tasks are broken into smaller, simpler sub-tasks. This approach, known as chain-of-thought, helps models reduce errors and improve accuracy by reasoning step by step.

But there’s another benefit to breaking tasks down in advance: all the individual claims can be verified simultaneously. Instead of waiting 40 seconds for the model to process 4 claims sequentially, we can verify them all at once in just 4 seconds. As the size of the text to be verified increases, so do the time savings.

As LLM technology continues to scale, we expect that RAG-Verification to operate sub-second, making automated fact-checking invisible and ubiquitous.

Primer Enterprise

Informed, defensible analysis

Primer Enterprise is a secure AI platform that helps analysts and mission teams across the Intelligence Community, Defense, and Civilian agencies analyze massive volumes of unstructured data. It transforms fragmented reports, proprietary data, and open-source information into structured, traceable insight that supports faster, defensible decision-making.

Learn about Primer Enterprise
Webpage discussing the impact of the global AI chip race on US security in the Pacific, featuring a text summary, an interactive map with numbered locations, and a sidebar with insights and relevant document titles.

Primer Command

Real-time operational clarity

Primer Command is an AI-powered monitoring platform that helps mission teams keep track of narratives, track evolving topics, and detect emerging threats across global news and social media. It provides real-time visibility into the information environment so leaders can understand events as they unfold.

Learn about Primer Command
Dashboard showing social media analytics including trending extractions for people, organizations, locations, hashtags, social highlights, sentiment analysis, social feed posts, and news feed about AI chip security concerns and cyber attacks.

Learn about AI solutions for better, faster decisions

Book a demo