Rag-V: divide and conquer with factual claims

In our last post, we looked under the hood at how we define factual errors in RAG-Verification. Now, let’s dive into how we detect and correct those errors.

Detecting factual inconsistencies in text is not a new task. Professional journalists call it fact-checking, part of tradecraft that goes back more than a century. RAG-Verification works in a similar way. The first step is to break the text into individual factual claims. Consider this RAG example:

question:

Are UK special forces operating in Ukraine?

retrieved sources:

[1] Leaked US military documents indicate that the UK has deployed as many as 50 special forces to Ukraine. The documents suggest that more than half of the western special forces personnel present in Ukraine between February and March this year may have been British. It is unclear what activities the special forces may have been engaged in or whether the numbers of personnel have been maintained at this level. (11 April 2023)

[2] The FSB alleges it thwarted an attempt by British special forces to facilitate a landing of Ukrainian sabotage troops on the Tendrov Split, a strategic sandbar located in the Black Sea. The agency claims to have apprehended a senior figure within the Ukrainian naval special forces, providing his name and date of birth as proof of the operation’s existence and the SBS’s direct involvement. Britain’s Defence Ministry has yet to respond to these allegations. (12 April 2024)

generated answer:

Leaked US military documents indicated the possible presence of up to 50 UK special forces personnel in Ukraine in 2023.[1] The FSB alleges to have apprehended British special forces during a failed operation, however Britain’s Defence Ministry has not responded.[2]

RAG-Verification begins by converting the generated text into factual claims:

Factual claims citing source [1]:

  • US military documents were leaked with information about UK special forces in Ukraine.
  • The leaked US military documents indicate the possible presence of up to 50 UK special forces personnel in Ukraine in 2023.

Factual claims citing source [2]:

  • The FSB alleges to have apprehended British special forces during a failed operation.
  • Britain’s Defence Ministry has not responded to allegations by the FSB.

 

For each claim, we construct a fact-checking prompt for a large language model (LLM). The prompt includes the original question, the generated answer, the specific claim, and the cited sources. The model is then tasked with verifying the factual consistency of the claim based on the provided sources, producing a verdict (yes/no) along with an explanation for its decision.

This evaluation phase, which we call RAG-Eval, is inspired by work at the Allen Institute for AI, who pioneered claim generation last year. Since then, LLMs have made significant advancements in speed and accuracy.

From a human fact-checker’s point of view, RAG-Verification may seem inefficient. What starts as a single fact-check on a small piece of generated text expands into four separate tasks—and this is before even reaching the error-correction phase.

While breaking a single fact-checking task into four distinct checks may seem inefficient, it plays to the strengths of large language models. A human would verify the entire response at once, but LLMs perform better when tasks are broken into smaller, simpler sub-tasks. This approach, known as chain-of-thought, helps models reduce errors and improve accuracy by reasoning step by step.

But there’s another benefit to breaking tasks down in advance: all the individual claims can be verified simultaneously. Instead of waiting 40 seconds for the model to process 4 claims sequentially, we can verify them all at once in just 4 seconds. As the size of the text to be verified increases, so do the time savings.

As LLM technology continues to scale, we expect that RAG-Verification to operate sub-second, making automated fact-checking invisible and ubiquitous.