Primer acquires Yonder, adds disinformation analysis to AI portfolio for information operations.READ MORE
Creating a searchable library from a massive volume of digital content requires topic generation, but the subject matter experts who vet them become the bottleneck. Human annotators cannot be scaled because they must read, analyze, and organize each document.
To overcome this, one globally-respected education brand leveraged Primer’s NLP platform to classify and tag more than 100,000 articles.
Download the white paper: Structured & Unstructured Data
The organization’s manual vetting process resulted in three challenges in search of a solution.
• Increase efficiency of well-compensated experts to improve the bottom line
• Decrease time to insight by machine-generated topics humans can approve
• Standardize topic generation to fight inconsistencies inherent in a manual, human process
The solution was a custom Topic Text2Text model that generated topics tags for any given article. This allowed the organization to train the model and automate the tedious, time-consuming, and inefficient manual process.
To start, the model had to align with an existing ontology of 700 topics that were necessary for the three different business units that collaborated to create it. The customer also expected the solution to read and process web articles, magazine articles, and output delivered via an API.
Since most NLP articles have a limit on character consumption, Primer’s custom advanced document processing workflow broke longer articles into chunks, ran the Text2Text model on them, and then reassembled the chunks.
Yonder by Primer solved the three challenges, with all the Yonder processing integrated back into the customer’s systems. Additionally, 85% of the machine-generated tags are accepted by the human SMEs vetting them. The models continue to be trained, with the goal of complete automation.
Read about more Primer solutions in our Resource Center. Or contact Primer and let’s discuss how our NLP technology can help you make mountains of data meaningful and valuable.