Last month the US Army CIO sent out a memorandum with “Guidance on Generative Artificial Intelligence and Large Language Models.” One bullet point in particular has stuck with me.
Gen AI tools are fallible and can produce “hallucinations” and biased results. Hallucinations are a phenomenon whereby responses include or incorporate fabricated data that appears authentic. Therefore, these tools must be accompanied by a robust review process which may include the critical thinking skills of human expertise.
The CIO is referring to human-in-the-loop review, where a human operator is required to engage with the output of a Gen AI system before it leads to a significant decision or action. I strongly agree with that guidance.
But it stuck with me because there is another interpretation: Gen AI systems should also take advantage of critical thinking skills inspired by human expertise.
You may have heard the term “chain-of-thought”, sometimes shortened to CoT. The idea, first described in a 2022 Google Research paper, is both simple and powerful: If you want a large language model to solve a complicated problem, don’t just ask for the answer. Instead, you should ask it to first break down the problem into discrete steps, literally thinking out loud before generating its final answer. A few months later, a team of researchers in Japan discovered a spell-like phrase that unlocks this power: “Let’s think step by step.”
Here’s a very real example of CoT in action at Primer. Consider the Gen AI use case of helping a US Air Force operator quickly decide where to land a plane carrying heavy equipment. Besides the obvious inputs to this decision-which airports have appropriate runways, which are close enough to the equipment’s ultimate destination-the operator must read through any relevant NOTAMs. These short, jargon-heavy notifications describe potential hazards from weather, construction, and power outages to flocks of birds. This can quickly add up to an overwhelming amount of information.
You could show all this information to a Gen AI model and ask it to generate a list of the best options for landing the airplane. It will definitely give you an answer. But would you trust it?
The better approach is to use CoT, forcing the model to think it through, step by step. This has two advantages. First, CoT helps the model reduce hallucination errors, as shown convincingly in a paper last year. So the final list of recommendations is far more likely to be correct.
But there’s another advantage, and this goes back to the CIO’s guidance on critical thinking. The best way to help that human who is reviewing the output of a Gen AI model is to show the reasoning behind the output. You get that as a by-product of CoT.
This is how Gen AI applications will earn their way into mission critical applications. The evidence and reasoning behind an AI-generated claim should never be more than a click away for human reviewers.