Previously we explored how RAG errors are defined, distinguishing the difference between hard and soft errors. We also outlined how RAG text is divided into factual claims for parallel verification. Now let’s examine the final part of the RAG-Verification process: correcting detected factual inconsistencies.
For human writers, correcting errors can seem straightforward. Some require minor adjustments to a single sentence, while others may demand a full paragraph rewrite. Humans make these decisions intuitively. But how do we train a machine writer to correct errors identified by a machine fact-checker?
Consider the following example of an error identified by RAG-V:
question:
Are UK special forces operating in Ukraine?
retrieved sources:
[1] Leaked US military documents indicate that the UK has deployed as many as 50 special forces to Ukraine. The documents suggest that more than half of the western special forces personnel present in Ukraine between February and March this year may have been British. It is unclear what activities the special forces may have been engaged in or whether the numbers of personnel have been maintained at this level. (11 April 2023)
[2] The FSB alleges it thwarted an attempt by British special forces to facilitate a landing of Ukrainian sabotage troops on the Tendrov Split, a strategic sandbar located in the Black Sea. The agency claims to have apprehended a senior figure within the Ukrainian naval special forces, providing his name and date of birth as proof of the operation’s existence and the SBS’s direct involvement. Britain’s Defence Ministry has yet to respond to these allegations. (12 April 2024)
generated answer:
Leaked US military documents indicated the possible presence of up to 50 UK special forces personnel in Ukraine between February and March of 2023.[1] The FSB alleged that Ukrainian special forces assisted a failed operation by British special forces, however Britain’s Defence Ministry has not responded.[2]
error:
The following claim is not consistent with its cited source:
Claim: The FSB alleges that Ukrainian special forces assisted a failed operation by British special forces. [2]
Explanation: The cited source describe British special forces facilitating an operation involving Ukrainian forces, not Ukrainian special forces assisting a British operation.
This RAG-V error report includes all of the information needed to correct the generated answer. But what exactly do you do with it?
One option is to simply delete the inaccurate sentence. (The RAG-V pipeline maps generated sentences to factual claims, making this possible.) However, while this is both efficient and fast, there are two significant drawbacks. First, this degrades the generated text, often rendering it into an incomplete answer. Second, and worse, removing a sentence can change the overall meaning of a paragraph.
Another option is to fully rewrite the generated answer. This does generally work, but it is inefficient. In our analysis of RAG errors, we find that the vast majority only require a correction to the wording of a single sentence.
We’ve settled on a compromise solution. When RAG-V detects one or more errors, it rewrites only those sentences. Then we send the full text back through the pipeline. If any error is detected, the process repeats, giving the system another shot on goal.
How many times does RAG-V have to rewrite RAG text before it passes? If the distribution had a fat tail, with a high frequency of high-multiple retries, then this wouldn’t work. But luckily, multiple retries are rare. The vast majority of detected errors are successfully corrected after a single pass. The most we have ever seen is 6 retries. So there is a long tail, and minimizing those outliers is our focus now.
RAG is a core AI component, powering critical functions such as search, question-answering, and automatic report generation. To ensure its reliability for mission-critical applications, RAG cannot remain a ‘black box.’ The feedback we’ve received so far from our defense and intelligence customers confirms that RAG-V is the way forward for earning that trust.
See Primer in action
The Primer team will demonstrate RAG-V live in their booth at AUSA in Washington, DC (Booth #348) October 14-16, 2024 and DoDIIS in Omaha, NE (Booth #1717) October 27-30, 2024.