Blog

RAG-V: the future of trustworthy AI

August 21, 2024
This post is archived.
References & links may be out of date

Envision tapping into your data’s full potential with AI so advanced, it transforms how you operate. Deploying LLMs alongside Retrieval-Augmented Generation (RAG) isn’t just a step forward—it’s a game-changer. At Primer, we’ve experienced just how transformative Generative AI can be when paired with your proprietary, internal data.

At Primer, we’ve seen this ourselves in our day-to-day lives. Beyond using Co-Pilot to help jumpstart code, we’ve integrated LLMs into Slack channels and Confluence documentation to more quickly find, understand, and inform one another on what’s most critical.

But here’s the catch: while the initial results can seem astonishing, there’s an underlying complexity that most overlook—model drift, variance between different models, and the challenges of ensuring accuracy over time. Our encounters with these issues in our data have driven us to pioneer solutions that combine the power of probabilistic AI with the precision of traditional engineering.

The LLM as a modern VM: a technical analogy

Our libraries allow seamless shifts between commercial model APIs, and the qualitative differences between parameter sizes and vendors are significant—and worth noting.

To draw a simple analogy, an LLM functions much like a modern virtual machine (VM). But instead of Java running on a JVM, think of English being processed by GPT-4. While switching JVM versions is somewhat reliable, it’s not without risks. The key difference is that today’s LLM ‘VM’ is built from data, with decision-making paths shaped by the inputs and outputs it’s trained on. These models are subject to data shifts over time, effectively creating a new ‘VM’ with each release.

The U.S. Army has confronted this challenge head-on, weighing deterministic knowledge graphs against potentially compromised datasets. Rather than relying solely on Machine Learning to create a trustworthy “VM,” there’s a need for verified and curated knowledge graphs, with Overton windows closely monitored for shifts.

Merging engineering rigor with AI power

We believe there’s a middle ground—one that combines the rigor of traditional engineering with the power of probabilistic techniques. By investing in robust evaluation, validation, and correction algorithms, you can ensure your LLM and RAG are deployment-ready, with upstream versions locked down to prevent drift.

This goes beyond standard LLM evaluation techniques like MMLU, Hugging Face’s Evaluation Harness, or Adversarial Chatbot Arenas. These tests must run on your RAG and proprietary data, and against known truths. And yes, having a verified, curated knowledge graph is crucial here.

Introducing RAG-V: the future of trustworthy AI

We’re calling this concept RAG-V. We all know that we can’t improve what we can’t measure.  Thus, we’ve engineered a new scoring system for RAG+LLM accuracy and rigorously tested our software against it. We’ve also explored methods to correct these inaccuracies, resulting in reliable insights—complete with references and explanations you can trust.

Stay tuned for our presentation at the INSA Summit and our upcoming arxiv.org paper. We’ll also be unveiling new functionality in Q4 that balances deterministic and probabilistic approaches, delivering trusted and reliable AI.

Primer Enterprise

Informed, defensible analysis

Primer Enterprise is a secure AI platform that helps analysts and mission teams across the Intelligence Community, Defense, and Civilian agencies analyze massive volumes of unstructured data. It transforms fragmented reports, proprietary data, and open-source information into structured, traceable insight that supports faster, defensible decision-making.

Learn about Primer Enterprise
Webpage discussing the impact of the global AI chip race on US security in the Pacific, featuring a text summary, an interactive map with numbered locations, and a sidebar with insights and relevant document titles.

Primer Command

Real-time operational clarity

Primer Command is an AI-powered monitoring platform that helps mission teams keep track of narratives, track evolving topics, and detect emerging threats across global news and social media. It provides real-time visibility into the information environment so leaders can understand events as they unfold.

Learn about Primer Command
Dashboard showing social media analytics including trending extractions for people, organizations, locations, hashtags, social highlights, sentiment analysis, social feed posts, and news feed about AI chip security concerns and cyber attacks.

Learn about AI solutions for better, faster decisions

Book a demo