Mastering search: The semantic edge

The US government has a massive search challenge. It generates billions of pages every year, from tax audits and budget reports to the president’s daily intelligence brief. Classified data alone is growing at an estimated rate of 50 million documents per year.[1]

When searching for information, no one has a tougher job than intelligence analysts. Crucial information is scattered among data silos, each with their own specific search tools. Locating the right information is the most time-intensive part of their day, leaving a thin sliver of time to read, synthesize, and write.

Artificial intelligence can help analysts find the information they need, particularly with a new technique called semantic search. But to understand the analysts’ dilemma, one must understand the tool they use today: boolean search.

A Search Nightmare

Crafting a good boolean query is a painfully learned art. Consider this boolean query used to find relevant documents about Ukraine’s relationship with NATO member states:

In boolean search, an analyst can’t simply search for “Ukraine foreign relations” because that will retrieve only documents with those exact words. Intelligence analysts spend hours honing their boolean searches over and over until, finally, they get the results they need. Good booleans get passed down from analyst to analyst like treasured family heirlooms.

Dream Search

What if our analyst could search with a simple plain language query? She simply types “What are Ukraine’s relationships with NATO member states?” and the documents that come back are about exactly that, regardless of whether they include those exact words. Even better, the most relevant parts of the retrieved documents are highlighted.

This is semantic search. Rather than having the user define what counts as relevant, the search system “understands” the meaning of the user’s query and retrieves documents with truly relevant content.

In semantic search, every chunk of text in your documents and your query is assigned an address in a high-dimensional “semantic space”. The system then gathers up documents that are in the same semantic neighborhood as her query. The closer they are in the semantic space, the higher the relevancy. The breakthrough that makes this possible is an AI tool called a language model.  

Voilà! Now you understand the core concept behind semantic search. 

Even Better Search

What if instead of search, our analyst could just ask the question that motivated her search in the first place?

This is semantic question-answering. The trick is to add a large language model to the end of the search flow. Taking the user’s question and search results, the model generates a direct answer to the user’s question, including in-line citations to sources.

Primer has semantic question-answering deployed in products serving US Defense and IC customers. Below is a screenshot showing it in action. 

With the help of AI, analysts can now engage with the vast corpus of government data in a more natural way, unburdened by keyword limitations. This translates to faster analysis and faster decision-making.

1https://www.nytimes.com/2023/01/27/briefing/classified-documents-government.html