Chain-of-thought is a secret weapon against hallucinations

July 24, 2024

John Bohannon

This post is archived.

References & links may be out of date

Last month the US Army CIO sent out a memorandum with “Guidance on Generative Artificial Intelligence and Large Language Models.” One bullet point in particular has stuck with me.

Gen AI tools are fallible and can produce “hallucinations” and biased results. Hallucinations are a phenomenon whereby responses include or incorporate fabricated data that appears authentic. Therefore, these tools must be accompanied by a robust review process which may include the critical thinking skills of human expertise.

The CIO is referring to human-in-the-loop review, where a human operator is required to engage with the output of a Gen AI system before it leads to a significant decision or action. I strongly agree with that guidance.

But it stuck with me because there is another interpretation: Gen AI systems should also take advantage of critical thinking skills inspired by human expertise.

You may have heard the term “chain-of-thought”, sometimes shortened to CoT. The idea, first described in a 2022 Google Research paper, is both simple and powerful: If you want a large language model to solve a complicated problem, don’t just ask for the answer. Instead, you should ask it to first break down the problem into discrete steps, literally thinking out loud before generating its final answer. A few months later, a team of researchers in Japan discovered a spell-like phrase that unlocks this power: “Let’s think step by step.”

Here’s a very real example of CoT in action at Primer. Consider the Gen AI use case of helping a US Air Force operator quickly decide where to land a plane carrying heavy equipment. Besides the obvious inputs to this decision-which airports have appropriate runways, which are close enough to the equipment’s ultimate destination-the operator must read through any relevant NOTAMs. These short, jargon-heavy notifications describe potential hazards from weather, construction, and power outages to flocks of birds. This can quickly add up to an overwhelming amount of information.

You could show all this information to a Gen AI model and ask it to generate a list of the best options for landing the airplane. It will definitely give you an answer. But would you trust it?

The better approach is to use CoT, forcing the model to think it through, step by step. This has two advantages. First, CoT helps the model reduce hallucination errors, as shown convincingly in a paper last year. So the final list of recommendations is far more likely to be correct.

But there’s another advantage, and this goes back to the CIO’s guidance on critical thinking. The best way to help that human who is reviewing the output of a Gen AI model is to show the reasoning behind the output. You get that as a by-product of CoT.

This is how Gen AI applications will earn their way into mission critical applications. The evidence and reasoning behind an AI-generated claim should never be more than a click away for human reviewers.

‍

Primer Enterprise

Informed, defensible analysis

Primer Enterprise is a secure AI platform that helps analysts and mission teams across the Intelligence Community, Defense, and Civilian agencies analyze massive volumes of unstructured data. It transforms fragmented reports, proprietary data, and open-source information into structured, traceable insight that supports faster, defensible decision-making.

Learn about Primer Enterprise

Webpage discussing the impact of the global AI chip race on US security in the Pacific, featuring a text summary, an interactive map with numbered locations, and a sidebar with insights and relevant document titles.

Primer Command

Real-time operational clarity

Primer Command is an AI-powered monitoring platform that helps mission teams keep track of narratives, track evolving topics, and detect emerging threats across global news and social media. It provides real-time visibility into the information environment so leaders can understand events as they unfold.

Learn about Primer Command

Dashboard showing social media analytics including trending extractions for people, organizations, locations, hashtags, social highlights, sentiment analysis, social feed posts, and news feed about AI chip security concerns and cyber attacks.

Learn about AI solutions for better, faster decisions

Book a demo

Chain-of-thought is a secret weapon against hallucinations

Informed, defensible analysis

Real-time operational clarity

Recommended reading

Left of Launch: Why Counter-UAS Must Move Upstream

Accelerate the homeland security mission with Primer real-time intelligence

The future of AI in healthcare

Learn about AI solutions for better, faster decisions