Primer acquires Yonder, adds disinformation analysis to AI portfolio for information operations. Read More
Language Agonistic Multilingual Sentence Embedding Models
Sentence embeddings have enabled us to compare semantics of sentences numerically, which are now essential for tasks such as semantic textual similarity, semantic search and sentence clustering. Unlike keyword based search, which retrieves lexically similar contexts but not necessarily what you are looking for, semantic search retrieves only semantically relevant sentences to your query. For example, with a query “Thousands demanding climate change action”, we can retrieve a sentence “Copenhagen: protests against global warming” with semantic search, but not with keyword search.
In particular, a rapid development of transformer based multilingual sentence embedding models over the past year now enables us to handle semantics of sentences across multiple languages with just one model. This can be done without needing to translate sentences, which risks distorting the original meaning with bad translation and is computationally expensive.
So how well can these models identify semantic similarities of sentences, regardless of languages, i.e., being language agonistic? How well can they retrieve the most relevant document to a query, from a pool of multiple different languages? Some models were trained with cross-lingual translation pairs and are only intended to be used for translation. Thus little study has been done on investigating cross-lingual semantic textual similarity on semantically similar cross-lingual sentence pairs (instead of translation pairs, which are supposed to be semantically same).
Primer ingests vast amounts of documents daily, and it is important that our systems can retrieve semantically similar documents accurately across multiple languages on news, social media, or companies’ internal documents.
Here, we conducted a detailed evaluation of publicly available multilingual sentence embedding models by measuring semantic similarity of news titles in 33 languages, and by visualizing the embeddings spaces.
Getting similar news from a pool of news contents in 30+ languages
A truly language-agonistic multilingual language model is one where all semantically similar sentences are closer than all dissimilar sentences, regardless of their language.
Examples of known multilingual sentence embedding models which were trained on a large number of languages are, LaBSE(109 languages) , multilingual SBERT(50+ languages)[2,3], and LASER3 (200 languages). Do these models perform well on retrieving semantically similar sentences from a pool of documents with 10s of different languages?
Here we investigate the multilingual sentence embedding models on their ability to identify semantically similar (but not exactly same) sentences by taking a look at news titles in 33 languages. 15,210 multilingual news titles were scraped from all news articles that have links to English WikiNews in non-English languages. A list of languages in the dataset is, English, French, German, Portuguese, Polish, Italian, Chinese, Russian, Japanese, Dutch, Swedish, Tamil, Serbian, Czech, Catalan, Hebrew, Turkish, Finish, Esperanto, Greek, Hungarian, Ukrainian, Norwegian, Arabic, Persian, Korean, Romanian, Bulgarian, Bosnian, Limburgish, Albanian and Thai. Then, sentence similarity of the English news title and the foreign news title of the same news (positive pairs), as well as of the news which has no common categories (negative pairs) were calculated.
For example, a WikiNews article titled “United Kingdom buries Queen Elizabeth II after state funeral” has linked articles in 11 other languages. Their titles are shown below.
On the other hand, a WikiNews article titled “Very serious’: Chinese government releases corruption report”, which has no overlapping topics with the news above, has linked non-English news articles with following titles.
Since there are no common topics between these two news events, their titles should be dissimilar to each other regardless of the languages. For example, the following can be regarded as positive and negative sentence pairs for English – French news title pairs.
Positive English – target language sentence pairs were created from all English WikiNews pages that have international news pages linked to them, and negative English – target language sentence pairs were created from all possible sets of news articles that have no overlapping topics. The following shows the distribution of cosine similarity scores of positive and negative title pairs, grouped by languages. A box indicates the interquartile range of the distributions. Similarities were calculated using one of the three multilingual sentence embeddings SBERT(distiluse-base-multilingual-cased-v1), SBERT(paraphrase-multilingual-mpnet-base-v2), and LASER3.
(a) SBERT distiluse-base-multilingual-cased-v1
(b) SBERT paraphrase-multilingual-mpnet-base-v2
Fig. Distribution of cosine similarity scores of positive ( cross-lingual pairs of same news) and negative (cross-lingual pairs of unrelated news) title pairs, grouped by languages
SBERT paraphrase-multilingual-mpnet-base-v2 model and LASER3 model have similar cosine similarity scores across all languages, except Tamil, Limburgish, and Thai in SBERT paraphrase-multilingual-mpnet-base-v2 model. On the other hand, on SBERT distiluse-base-multilingual-cased-v, average cosine similarity of positive sentence pairs varies widely depending on languages, from ~0.8 in Portuguese to ~0.2 in Tamil. Due to the language bias, a sentence retrieval model built with this embedding model could rank Portuguese sentences that are not that similar to an English query much higher than a Hebrew sentence which has the exact same meaning as the query.
LASER3 gives higher cosine similarity scores for positive pairs (average 0.7~0.8), but also for negative pairs (average ~0.55, in contrast to average 0.05 for SBERT). Even though LASER3 was trained on 200 languages including all 32 foreign languages that are on our evaluation datasets, they struggle to distinguish between similar news titles and dissimilar news titles on some English-foreign language (e.g., Thai) title pairs. We can conclude that SBERT(paraphrase-multilingual-mpnet-base-v2) is the best of the three models discussed here for the multilingual sentence similarity search task, since the differences between the cosine similarities of positive sentence pairs and the negative sentence pairs are the largest on average. This result shows that it is important to know if your model has a language bias in languages of your interest.
Note here that positive sentence pairs used here are not exactly semantically same, as you see in the example positive pairs shown above (e.g.., a positive pair “United Kingdom buries Queen Elizabeth II after state funeral”
and ”大不列顛及北愛爾蘭聯合王國女王伊麗莎白二世陛下逝世，享耆壽96歲 (translated: Her Majesty Queen Elizabeth II of the United Kingdom of Great Britain and Northern Ireland dies at 96)”). Thus, we don’t expect cosine similarities of the positive pairs to have the value exactly, or very close to 1.
Visualization of the Distribution of Sentence Embeddings By News Topics
To further understand how embeddings of news titles are distributed in the multilingual semantic embedding spaces, we visualized them in 2 dimensions. A figure below shows the distributions of news titles embedded with the SBERT(paraphrase-multilingual-mpnet-base-v2) model. The dimension of the embedding space was reduced to 2D using a dimensionality reduction technique called t-SNE, which preserves local structure of the clustering.
Sentence embeddings are colored 13 news topics defined by the WikiNews: Crime and law, Culture and entertainment, Disasters and accidents, Economy and business, Education, Environment, Heath, Obituaries, Politics and conflicts, Science and technology, Sports, Wackynews, Weather. Here, I excluded news titles which have more than one of the 13 topics.
Fig. SBERT (paraphrase-multilingual-mpnet-base-v2) embeddings of WikiNews titles (34 languages) with its dimension reduced to 2D with t-SNE method
We can see the embeddings of multilingual news titles clustered together by news topics, indicating that our embedding space contains meaningful information about the topics seen in the news.
Fast Search on Multilingual Corpora
Here we showed that multilingual sentence embedding models are potentially powerful tools, and it is important to understand the language bias when using them for multilingual semantic search tasks.
Semantic search using a multilingual embedding model gives us great advantage in many ways. Compared against first translating documents and then using the Okapi BM25 algorithm, which is a well known bag-of-words retrieval function, semantic search using multilingual dense embedding models enabled us to retrieve news articles of the same events with higher precision and recall, and more relevant news, without ever worrying about the language of the text. Furthermore, computing dense embeddings is much faster than translating sentences in general, and we find that pre-computing time for this semantic search using SBERT (paraphrase-multilingual-mpnet-base-v2) model was more than 100 times faster than the keyword based model using the light translation model nllb-200-distilled-600M. These trends apply to a wide range of document types beyond news, from short social media posts, to companies’ internal documents with varying text lengths and domains. At Primer, we constantly seek the best solution to retrieve, cluster and understand documents efficiently and accurately, and those documents are not limited to English, but any texts that exist in the world.
 Language-agnostic BERT Sentence Embedding (Feng et al., ACL 2022)
 Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (Reimers & Gurevych, EMNLP-IJCNLP 2019)
 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation (Reimers & Gurevych, EMNLP 2020)
 No Language Left Behind: Scaling Human-Centered Machine Translation (NLLB Team, arXiv:2207.04672, 2022)
Physics Today is the membership magazine of the American Institute of Physics. The magazine informs readers about important developments and provides a historical resource of events associated with physics. Physics Today Editor, Toni Feder, recently met with Primer CEO Sean Gourley and discussed Sean’s life as a researcher as well as the work Primer is doing to help government agencies and global enterprises monitor and respond to emerging threats from adversaries.
Excerpts from the interview include:
For his 2006 PhD at the University of Oxford he applied artificial intelligence to insurgencies. The work was “on the fringes of the physics mainstream” and journals would tell him it was political science. In the past two decades, though, “modeling social systems and complexity has become a full member of the physics space,” he says.
Sean went on to become an entrepreneur. He is the CEO of the software company Primer, the second company he founded. It has offices in San Francisco, Washington DC, and London, and offers services with artificial intelligence to defense, intelligence, and commercial clients.
When asked about how he pivoted from researching how insurgencies work to becoming an entrepreneur Sean recalled a lightbulb moment when he realized that to have an impact he needed to build something that could make his theory around insurgency behavior come to life. “Based on my experience looking at insurgencies, I got really into visualizing and manipulating high-dimensional data. My first company, Quid, used network and graph theory with visual interactions to let people explore landscapes of data.”
He then founded Primer in 2015 to further advance his work in artificial intelligence. “We sell software that you can train to identify different pieces of structure, whether it’s weapons or points of interest or calls to action. We have connectors that let you plug into different data streams, whether it’s audio from radio, PDF documents, or emails. We’ve got those connectors, we’ve got the models, and then we’ve got the applications that run on top of those. It’s quite a complex set of components. The software maintains and generates the self-updating knowledge base for our users.”
When asked about his testimony before the US Chamber of Commerce AI Commission in July 2022 that the biggest impact artificial intelligence will have is in warfare, Sean noted that “We are still at the very earliest stages of artificial intelligence. The performance we are able to get and create with unencrypted radio content has been mind-blowing. Repeat that for images, for satellite data, for systems to avoid being shot down with UAVs [unmanned aerial vehicles]. You have swarms of robots and intelligence capabilities, but you’ve also got disinformation. You’ve got self-driving cars, so you don’t need people to drive convoys of tanks, and strategic insights and analysis to predict moves in, say, the South China Sea. If my artificial intelligence is better than yours, I’m going to knock your drones out, and now I’ve got air dominance. The fact that the best artificial intelligence wins the battles, and will win the war, hasn’t been fully internalized by our defense and intelligence communities.”
“We are in a crucial battle over whether the US keeps its military advantage,” warned Sean. “The consequences are immense. It’s an arms race with artificial intelligence.”
It is not news that China is inching closer to the west in terms of its technological developments. But in some key developments, particularly in the field of Artificial Intelligence (AI), they are narrowing the gap at an alarming rate. If you look at where the top AI research papers are coming from, China dominates. China also consistently files more AI patents than any other country. Likewise they have increasingly more AI technology firms. Second only to the US. If the US is able to maintain its grip on AI technological advancements how is it that China could still have superior AI in the battlefield?
Primer’s CEO, Sean Gourley discussed this question recently with Dan Faggella on the AI in Business Podcast. Sean and Dan also discussed how China has certain advantages in developing AI capabilities for cyberwarfare and how the West can maintain its strategic advantage in the global AI race. The following is a brief summary capturing select highlights of their discussion.
What is it about China that could give them an advantage?
Sean and Dan discussed how unlike in the US and other western countries, advanced AI technology companies are catapulted into government contracts in China. Sean noted that this is because they have members of the Chinese Communist Party (CCP) on their board and they get a clear path to get their tools into the hands of the military members on the front lines. Compare this to the US where the contract process takes years and they are most often given to huge consulting companies that are not as strong in AI technology generation.
The US and other western nations recognize this issue and are trying to streamline the process. But Sean warns that this change can’t come soon enough as speed is the critical factor when talking about AI technology. Even if the US has the top AI technology, if it takes three years to get the contract and eventually get the technology in front of soldiers then the advantage is gone. If China is able to get it to the front lines in 1-2 years they can even be behind technically and still out compete us on the battlefield.
“The speed to deployment in a world where AI is increasing in the rate that it is, is the determining factor of who has the best AI advantage in conflict, not who has the best AI capabilities in research.”
How can the West maintain its strategic advantage?
Contracts to acquire advanced AI technologies need to come from the top down to the AI companies, according to Sean. “There have to be big contracts on the table that have 5-10 years of funding towards it.” It can’t just come from the bottom up because there are too many barriers to entry.
In determining which companies will build the products, i.e. do you opt for huge consulting firms or agile AI startups, the Defense Department should factor in the level of precision behind China’s AI models. Noting, for example, that “If you are going to run image recognition on top of your drones, [know that China] has spent 10 years understanding the holes in it so they can quickly exploit it.” Western allies need to quickly catch up and identify the AI tech companies with the products and talent that can get them there.
Collaboration between strategic partners is also key here, according to Sean. If the US and UK can establish an AI systems and model sharing relationship then it will send a strong signal to the rest of the EU for further partnerships. “We have a lot to learn from allies [like Ukraine] dealing with attacks from Russia right now…How has warfare changed… [including] image recognition systems on drones and detection for satellite reconnaissance. Those are important lessons to see how AI can be operationalized.”
Sean also warned about the west maintaining control of the information space. When considering TikTok, which dominates user screen time in the US, Sean notes that: “You can’t fight a war in this environment if you don’t have control over your own information networks because they are so susceptible to attacks … the simplistic way to bait an opponent is to convince them that there is no war to fight.”
To learn more about Primer’s work with the defense and intelligence communities, visit https://primer.ai. To listen to the full interview, visit: How the West Can Compete with China in the Global AI Race – with Sean Gourley of Primer
Read: Sean Gourley, CEO & Founder of Primer – Interview Series, Unite.AI
Unite.AI provides news and analysis of the latest developments in machine learning and AI technology and showcases emerging AI companies that deserve industry recognition. Founder and CEO Antoine Tardif sat down with Primer CEO Sean Gourley on the heels of Gourley’s keynote presentation at the Ai4 conference in Las Vegas about disinformation, information operations, and the AI arms race with China.
They discussed the fast-changing mal/dis/misinformation (MDM) ecosystem, the rapid pace of AI technology advancements, the battle for control over online narratives, the manipulation of beliefs, and Primer’s role in helping government agencies and global enterprises monitor and respond to emerging threats from adversaries.
“Sean is a treasure trove of information regarding misinformation campaigns and government propaganda,” Tardif wrote.
Excerpts from the interview include:
“All of this comes down to the world of opinion formation and the mechanics of how groups arrive at consensus. There’s a sub-branch of physics, computational physics, that’s been studying models of how opinions form, propagate, and are adopted,” said Gourley, who holds a PhD in physics from Oxford where his research as a Rhodes Scholar focused on complex systems and the mathematical patterns underlying modern warfare.
When it comes to the thorny issue of determining truth, Gourley said, “The AI’s job is not to figure out who’s right. The AI’s job is to determine whether or not someone’s tried to actively influence you to believe one side or the other.”
He continued, “People will use artificial intelligence to attempt to persuade you. These are techniques that AI can enhance, and AI can also defend against. But we need to get away from this idea that AI can determine the truth because, oftentimes, the truth is perhaps less important than the battle for the narrative.”
“If an information operation has been conducted, we need to get real-time or near-real time understanding of what that looks like, what the motivations are, what the manipulations are, and start to act to either shut down those bot networks or start limiting the spread of that information,” Gourley said.
“We are in an AI arms race with China,” Gourley asserted. It’s a race the U.S. cannot afford to lose. “The winner of that race is going to have a very dominant military advantage over whoever is second place in that race. So China is going to pursue the components of artificial intelligence very aggressively.”
“And this is being recognized by the House Intelligence Committee,” he added. “As they’ve put together their Intelligence Authorization Act, they’ve called out, specifically, methods for detecting Chinese influence operations in the Caribbean, South America, Central America.”
Importantly, “The Intelligence Authorization Act also called out the ability to adopt artificial intelligence, the ability to use commercial off-the-shelf technologies, and the ability to deploy no code environments into these [intelligence community] organizations. There’s been some really constructive stuff coming off the Hill,” Gourley stated.
In concluding the interview, Gourley said Primer is “engaged in this mission to help support the U.S. and its allies to bring the best technology to put in the war fighter’s hands. We want more technology companies to come and join this fight.”
To learn more about Primer’s work with the defense and intelligence communities, visit https://primer.ai.
Read the full interview: Sean Gourley, CEO & Founder of Primer – Interview Series, Unite.AI
Influential podcaster and Silicon Valley veteran Niki Christoff invited Primer CEO Sean Gourley to appear as a guest on Tech’ed Up, following Gourley’s testimony at the U.S. Chamber of Commerce AI Commission hearing in Washington, D.C., about AI’s role in national security.
“Physicist and Rhodes Scholar, Sean Gourley, founded Primer, a start-up building mission-critical AI capabilities for the United States and its closest allies. He joins Niki in the studio to discuss Silicon Valley’s ambivalence about building and selling defense capabilities necessary to protect democracies. He explains how artificial intelligence is a ‘third offset’ – following nuclear weapons and precision munitions – that could end a war in just 72 hours. And China is hyper-focused on overtaking the United States in this arms race.” Tech’ed Up (08/11/22)
Christoff and Gourley discussed the “high-stakes moment” the U.S. and its allies are in today regarding AI and democracy, the growing AI arms race with China – >a race the U.S. can’t afford to lose – and the need to sound the “alarm bell” among Defense and Intelligence Communities.
As a superpower, the U.S. must acquire the technology needed to retain decision superiority on the battlefield and beyond, Gourley said.
Gourley and Christoff also discussed:
- Primer’s mission: to provide mission-ready AI solutions to those who protect our security and democracy
- AI’s role in “structuring” unstructured data – text, audio, images, video – to spotlight what’s happening in fast-changing situations
- How machines can augment manual tasks and help humans make better decisions faster
Microsoft invited Primer CEO Sean Gourley to address AI technology advancements and practical applications during its Digital Transformation and the Future of High Tech virtual event.
Gourley’s participation reinforced Primer as a “company to watch” within Microsoft’s global ecosystem. The event highlighted Microsoft’s “big bets” in technologies that are moving business and society forward, from AI to cybersecurity to the Metaverse and beyond. It featured Ray Wang of Constellation Research and executives from Microsoft, Micro Focus, and Rocket Central, in addition to Gourley, discussing their views on modernizing business applications and processes.
Gourley explained how Primer delivers mission-ready AI solutions to meet rapidly evolving risk and security needs – for defense and intelligence communities as well as global commercial enterprises that need to accelerate smarter decision-making.
The ability to fuse high-velocity disparate data streams together – language, audio, video, and images – and use AI to navigate all of it, in a single dashboard, provides better situational awareness and improves real-time decision making, Gourley said. For Primer, he added, “it’s about supporting those who protect our security and democracy.”
“Artificial intelligence is going through a growth phase unlike any other technology at the moment,” Gourley continued. “It’s unprecedented in the last two decades. What does that mean? Things that were previously the domain of humans are now the domain of machines. They’re cheaper, and they scale a lot more quickly than we can. This is transformational for the world we live in.”
“We’ve got the ability for human experts to train machines to do things that humans can’t. It’s fascinating to see humans and machines start working together and leveraging the best intelligence of both,” Gourley added.
Many more insights from Gourley’s conversation with the Microsoft team are available here.
If you would like to partner with Primer or learn more about our partnership program, please visit https://primer.ai/partners/ or email the team at firstname.lastname@example.org.
Sean Gourley discusses AI’s fast evolution and profound impact
Hint Health invited Primer CEO Sean Gourley to deliver a keynote presentation at Hint Summit 2022, the annual conference pushing the boundaries of multi-disciplinary, cutting-edge ideas to advance the field of medicine and healthcare.
Gourley described the rapid pace of development in AI and the profound implications as AI proliferates across all aspects of society.
Watch: The World of Artificial Intelligence, Hint Summit 2022
“Artificial intelligence is going to have its biggest impact in places where human intelligence has its limitations,” Gourley said to kick off his presentation, “such as the speed at which the human brain operates, which is hardwired into our biology. Artificial intelligence doesn’t have such limitations. It can move very quickly.”
Gourley shared several scenarios where speed matters. For example, “Algorithms have started to dominate high frequency trading,” he said, “not because they’re smarter than humans, but because they’re faster.”
While the world of AI technology is still in its infancy, the implications are tremendous and transformational – simultaneously fantastic and potentially catastrophic, Gourley cautioned. While some people want to hit the pause button on AI, there’s no way to stop it, or lock it away. Even regulating AI will be very hard, he added.
Given all of this, “The importance of human judgment becomes more valuable than ever,” Gourley asserted.
“For the most difficult tasks in the world, the ones that really matter – whether cancer detection or global pandemics or understanding the dynamics of conflict – the combination of machine intelligence with human intelligence is how we will solve the most important and pressing problems of our time. That’s what we need to embrace, with a full understanding of the risks. It’s about creating the best possible intelligence because we’re in a world that needs it.”
To learn more about AI’s fast evolution, current state of the art, where it’s going, and why language generation is the next frontier, watch Sean Gourley’s presentation, The World of Artificial Intelligence at Hint Summit 2022.
Learn more about Primer at primer.ai.
Watch the Microsoft High Tech Huddle with Microsoft’s Matt Hughes and Primer CEO Sean Gourley
In a fascinating conversation about the current and future state of AI, Microsoft’s High Tech Industry Leader Matt Hughes interviewed Primer CEO Sean Gourley about Primer’s passion for delivering mission-ready AI tools to the people who need them, the promise and perils of AI’s tremendous generative power, and predictions for AI’s impact on society in the months and years ahead.
Gourley described his early recognition of the importance of analyzing open source intelligence (OSINT) in the theater of war, starting with his 2009 Ted Talk. As the use of AI expands exponentially, so does the amount of data to analyze with mobile cameras and social media becoming so ubiquitous. Improvements in AI and in processing unstructured data are ways Primer helps Defense and Intelligence agencies and Fortune 500 organizations keep up with the explosion of data and make smarter decisions faster.
“With Primer, we’re able to deploy artificial intelligence to really help people make sense of fast-moving situations and harness all the open source intelligence that’s out there,” Gourley explained.
In relation to Primer’s capabilities on the ground in the Ukraine war, he added, “[operators can] tap into the data streams they care about, whether that’s unclassified radio communications from Russians on the ground in Ukraine, or videos coming off Telegram channels of tanks rolling into cities…Primer can translate, do object detection, image recognition, and feed it all together to give a sense of what’s happening and where it’s unfolding.”
Gourley also detailed similar experiences with commercial enterprise clients like Walmart that have improved their ability to make sense of massive volumes of customer feedback – with accuracy and efficiency. He emphasized that data as it becomes more structured results in a wide range of improvements.
“Structure gives us a new modality of interacting with the data” Gourley said. “The structuring of data allows us to reduce massively the cost of the questions that we want to ask. Machines are wonderful at reducing cost.”
Hughes and Gourley further discuss how AI/ML will affect the development of the Metaverse and Web3, and the generative power of AI. On the flip side, Gourley also warned of the dangers of misusing the technology.
“With AI’s generative capabilities, you’ve got a two-edged sword: amazing creativity, but also very significant capabilities for toxicity and damage. As it’s probably true with any sophisticated technology, it should both delight and scare you at the same time. And these generative AI models certainly do that.”
Gourley’s final comments contain several predictions for the future, including his perspective on AI-powered conflict, Great Power Competition, and the future of language:
“We’re going to see in the next decade a fundamental shift in how we think about language. Up until the last 24 months, language has been a domain solely of humans. It has been one of the distinguishing characteristics of humans. Nothing else in our known universe has really been able to wield language with any effect. Machines are very, very quickly climbing up that curve. And in many tasks in the next decade, they’re going to exceed our ability in language. This is going to change the way that we educate students, the way we communicate inside of business environments, and it’s going to change the way we think about knowledge. It’s going to be transformational beyond belief.”
Among the largest and most important AI conferences in the world, The AI Summit London began in 2015 when research and academia were the focus. Just seven years later, it has evolved into the industry’s foremost event, focused on the practical applications of AI for enterprise organizations and real-world solutions that are transforming business productivity. Given Primer’s growth and recent acquisitions, the time was right in 2022 to make an appearance.
Paul Vingoe, head of operations in Europe and the Middle East, represented Primer at the conference. AI Business TV reporter Ben Wodecki interviewed Vingoe to learn more about Primer and its industry-leading machine learning and natural language processing solutions for government agencies and Fortune 1000 companies.
Practical applications of NLP
Vingoe explained that while still a nascent technology, NLP is catching on quickly and users are becoming more confident in its abilities to identify relevant content from unstructured text to make better decisions. Vingoe explained how the legal sector is using AI to surface key details from unimaginable volumes of documents, contracts, license agreements, insurance policies, and other legal documents.
“It brings information to the human who can apply human judgment far quicker than they can find it by reading the documentation themselves,” Vingoe explained.
During the interview, Primer Command was processing live news media and social media, seeking out data on the Russia-Ukraine war.
“What is really powerful is in real time, you can see sentiment, you can see individuals have been named, you can see locations and organizations, you can bring that information together. What you’re seeing is the ability to, as a user, very quickly understand what’s happening in real time in a region,” Vingoe added.
The perfect example
During the conference, the UK released their National AI strategy. This presented an opportunity to demonstrate the power of NLP. The original document, interviewer Ben Wodecki admitted, took him about three hours to read.
“Within half an hour of it being released, we summarized that 40-page document into 1,600 words and 91% compression. And then we summarized it down to one paragraph, and it’s spookily accurate,” Vingoe explained.
In response to a final query on what Europe can expect from Primer in the coming year, Vingoe talked about Primer’s solutions and how they will evolve. He cited open source intelligence, tools that train models, and additions to Primer’s AI portfolio.
“We have a product called LightTag, which is a German company we’ve acquired,” he said about the data labeling solution for training models that Primer uses and then acquired in February 2022.
“We also have a product called Yonder, which does information analysis for brand management and other use cases,” he added, regarding Primer’s pioneering information manipulation analysis software. “The aim is to have a natural language processing environment that will allow our customers to access our models, and use the output of those models in their own applications in their own workflows, in their own environments.”
View the entire interview here.
For more information on Primer products, visit primer.ai/products.
Cybersecurity expert Aviva Zacks interviewed Primer CEO Sean Gourley for a profile published in SafetyDetectives. They discussed why Gourley founded Primer, the company’s mission, and how Primer’s NLP solutions support the defense and intelligence communities and large commercial customers, such as Walmart, in the U.S. and overseas.
Download the white paper “AI in Warfare: A Race the U.S. Can’t Afford to Lose”
When asked how Primer keeps its competitive edge, Gourley said, “It’s a fast-moving space, so we’ve made a huge investment into the core algorithms and the core infrastructure that support those algorithms. We keep pushing the capabilities forward and aim to move faster than everyone else.”
Gourley was also asked about emerging cyber threats and new attack vectors.
“The generative side of AI is perhaps the most transformational, like when the technology really can mimic any human on the planet exceptionally well,” said Gourley. “That is going to open up a Pandora’s box as we start navigating through a world where we don’t know what’s real and what’s not.”
Gourley added, “Anytime you have a more volatile world, it opens up different attack surfaces. The pandemic is certainly creating a more volatile world. We are still in the early days of [the Ukraine] conflict, and there will be a lot more happening in the cyber attack domain.”
Read the full article: SafetyDetectives Interview with Sean Gourley, Primer (June 30, 2022)
WashingtonExec profiled Primer in this story about government agencies turning to AI to make sense of massive volumes of data to support mission objectives.
Working with government agencies to detect bots spreading misinformation in Ukraine is just the beginning. Primer’s work with government agencies is giving the Department of Defense the real-time situational awareness it needs, according to a new article in WashingtonExec, written by Adam Stone.
Mark Brunner, president of Primer Federal and a former Senior U.S. Senate Advisor and U.S. Navy Commander, knows firsthand the need for data to inform action. He mentioned how the Pentagon and its departments are “dedicating $200 million in just this fiscal year alone for our major military commands to use AI more efficiently,” according to the article.
Read: A New Era of Warfare: How AI Unlocks Intelligence, by Primer CEO Sean Gourley
Brunner has worked in the military, overseas in a diplomatic post, on Capitol Hill, and also as a strategic consultant. With this experience, Brunner explained how manual data analysis performed by humans is very limited and can’t make sense of the urgent real-time situation awareness required in today’s information wars.
The article describes how “Primer addresses the need through a multipillar strategy for AI-powered situational awareness and decision support that encompasses strategic analysis, threat detection, information operations detection and countermeasures (e.g., mis/disinformation campaigns), audio extraction and summarization, and training and deploying custom AI models.”
“For most of us who have worked in and around government, the ability to ingest and process and synthesize that volume of data is really incredible.”
Mark Brunner, Primer Federal President, quoted in WashingtonExec
Since working with government agencies comes with its own hurdles, the article also describes what’s required for companies like Primer to break into the complex and often cumbersome government acquisition system. The article describes how “in government, there are hurdles around requirements like Authority To Operate, and Primer is working on its Federal Risk and Authorization Management Program authorization to overcome those.”
The article also discusses how Primer partners with contractors already working within the government to embed its analytics and AI tools. “Adding that capability into an RFP can be a tremendously compelling proposition,” Brunner said.
Primer’s work with the U.S. Air Force and Special Operations Command (SOCOM) brings the best Artificial Intelligence (AI) tools to operators who make mission-critical decisions. Recently, Primer was selected for the U.S. Air Force Advanced Battlefield Management (ABMS) IDIQ contract.
Brunner explains how large volumes of data—most of it unstructured—is what lies at the center of the problem for almost all government agencies. Primer’s machine learning models can work with a number of agency types—including the Army, the VA, or the Treasury, Brunner says in the article.
“You can’t just hire more humans,” Brunner continues in the article. “At Primer, we use machines to do the hard, tedious work that is physically impossible for humans to do.”
Learn more about Primer Command for real-time situational awareness.
“The ability to train and retrain AI models on the fly will become a critical advantage in future wars.”
Sean Gourley, Primer CEO, quoted in Wired
WIRED’s Will Knight took an in-depth look at recent advances in AI/ML to analyze open source intelligence (OSINT) in the context of military conflict. The story opens with colorful dialogue among Russian soldiers in Ukraine, captured from unencrypted communications channels, revealing their situation after encountering artillery fire.
“Their words were automatically captured, transcribed, translated, and analyzed using several artificial intelligence algorithms developed by Primer, a US company that provides AI services for intelligence analysts,” wrote WIRED. “Primer is one of a growing number of companies that could make these technologies more accessible to those in the defense world and in private industry.“
Read: As Russia Plots Its Next Move, an AI Listens to the Chatter, WIRED, April 4, 2022
According to WIRED, “Primer already sells AI algorithms trained to transcribe and translate phone calls, as well as ones that pull out key terms or phrases. Sean Gourley, Primer’s CEO, says the company’s engineers modified these tools to carry out four new tasks:
- to gather audio captured from web feeds that broadcast communications captured using software that emulates radio receiver hardware;
- to remove noise, including background chatter and music;
- to transcribe and translate Russian speech; and
- to highlight key statements relevant to the battlefield situation. In some cases this involved retraining machine learning models to recognize colloquial terms for military vehicles or weapons.
“The ability to train and retrain AI models on the fly will become a critical advantage in future wars, says Gourley.”
WIRED continued with insight from CSIS’s Emily Harding: “The amount of open source intelligence is impossible for anyone to process. Primer has distinguished itself for its ability to parse language.”
As Wired concludes, “Gathering and analyzing data using AI could eventually become central to battlefield operations. The US military is investing millions to develop AI software capable of ingesting and analyzing different signals in the field.”
Read: A New Era of Warfare: How AI Unlocks Intelligence, by Primer CEO Sean Gourley
“Such advances could provide critical information more quickly, allowing military decision makers to outfox their foes.” However, along with new opportunities, AI introduces new risks, as well.
The article concludes quoting Gourley, “Our philosophy on AI and defense is that whatever algorithm you go into the war with is not going to be the one that you end up with.”
Learn more about Primer Command for real-time situational awareness.
“Primer Command organizes information from 60,000-plus news sources and social media … to reveal real-time developments in volatile areas, such as war zones.” AI Business
AI Business, a leading content portal for artificial intelligence and its real-world applications, featured Primer Command in an article called “AI Tool for Monitoring Fast-Evolving Information in War Zones.”
The article brings attention to the serious challenges security and intelligence analysts face during breaking events like the Russian invasion of Ukraine.
“Volatile situations such as the Russian-Ukraine war often generate a mountain of information that can be overwhelming to analyze,” AI Business wrote. “The volume of content also makes it tough to discern key pieces of news and differentiate between real news and misinformation – in addition to foreign language difficulties.”
The article describes how “Primer Command uses advanced natural language processing technology to create structured data streams that eliminate duplicates and also offers real-time translations of more than 100 languages. The tool can also identify locations linked to the information.”
Primer CEO Sean Gourley is quoted, saying, “The goal, for us, is for these data streams to provide structure to help people understand the world around them.”
“People talk about the ‘fog of war’ but in reality it is more of a ‘cyclone,’ Gourley added. ’The issue is, you’re flooded with data.’”
AI Business describes how real-time situational awareness “is critical for governments and businesses to respond quickly and appropriately to events unfolding on the ground. But when a crisis occurs, there tends to be a spike in information being generated as news outlets report on the events and people start posting about it on social media. This can be overwhelming to wade through.”
This abundance of information can overload teams responsible for alerting national security and business leaders about risks and opportunities – and impede informed decision making.
Primer Command, AI Business writes, “can mitigate this information overload by using AI to identify people, organizations, objects, and other key information. Users also can filter by military advancements, diplomatic actions, impact to economics and trade, among others.”
AI Business concluded the article noting, “Beyond defense applications, Primer Command also has commercial uses. For example, its service can be deployed in live events such as the Super Bowl.” Other examples include the Olympics and major weather scenarios like the floods in Australia.
Access the full AI Business article: “AI tool for monitoring fast-evolving information in war zones” (Mar. 8, 2022)
This article by Financial Times’ Richard Waters explores the complex relationship between the U.S. Government and Silicon Valley, the pace of technology adoption in the face of unprecedented geopolitical risks, and the business prospects for today’s AI innovators engaging with the Pentagon to support the U.S. national security mission.
Read: Tech Start-Ups Struggle to Break into the Pentagon, Richard Waters, Financial Times, 03/17/2022
Primer CEO Sean Gourley and AE Krissy Holst, a former CIA analyst, were interviewed for the story. Waters opened his article citing Holst:
“…’Russia’s invasion of Ukraine is ‘probably going to be the most documented war we’ve seen.’ The deluge of real-time information online, including from social networks and messaging apps, could yield important intelligence – not just for military analysts, but for companies and other organizations concerned about the safety of staff or reliability of supply chains… Separating signal from noise in the fog or war is overwhelming.”
Waters continued, Primer “is one of many young tech businesses that have seen the business opportunity in national defence. It trawls thousands of sources online, using natural language processing to read and try to make sense of the vast amount of ‘open source intelligence.”
Gourley commented on the speed of the military’s procurement process, which, as Waters noted, “does not match the speed at which start-ups need to move. It takes two years to land a serious contract…meaning entrepreneurs seeking out military business essentially have to bet their company on being right on target.”
Previous coverage about Primer by Waters explores the growing role AI software plays in helping U.S. intelligence analysts – and increasingly many other knowledge workers – sift through vast amounts of data and prepare reports that analyze and synthesize the most important new events “to make sense of a complex world.”
Read: ‘Natural Language Understanding’ Poised to Transform How We Work, Richard Waters, Financial Times, 12/02/2018
For more information about Primer and to access product demos, contact Primer here.
Primer Wins U.S. Air Force Advanced Battlefield Management System (ABMS) IDIQ Award with $950M Ceiling
Primer has been awarded a multiple-award indefinite-delivery/indefinite-quantity contract to compete for orders under a $950,000,000 ceiling for the maturation, demonstration, and proliferation of capability across platforms and domains, leveraging open systems design, modern software and algorithm development in order to enable Joint All Domain Command and Control (JADC2). This contract is part of a multiple award multi-level security effort to provide development and operation of systems as a unified force across all domains (air, land, sea, space, cyber, and electromagnetic spectrum) in an open architecture family of systems that enables capabilities via multiple integrated platforms.
“Primer is proud to support the U.S. Department of Defense with AI and machine learning solutions that enable mission success and decision superiority.”
Sean Gourley, Primer CEO
Primer provides a next-generation, open, end-to-end artificial intelligence and machine learning platform to help the Services operationalize JADC2. Primer’s machine learning layer gives users robust capabilities to scale their text-based workflows at human grade and with machine speed.
With Primer, technical and non-technical users simply upload and prepare their data; select and retrain domain-specific pre-trained models or label and build their own; then visualize relationships, entities, and insights from their data in a flexible user interface (UI) or in their own applications – all within the same platform.
Primer empowers the Services to better de-risk operationalizing machine learning and to move faster, with confidence and at the scale needed to achieve JADC2 decision superiority.
U.S. Government and Fortune 50 organizations worldwide trust Primer to operate in sensitive and complex data environments. Primer’s deep and proven artificial intelligence and natural language processing (NLP) expertise is tailored to risk and security applications, including: identifying disinformation campaigns, monitoring global events in real time, responding to cybersecurity attacks, anticipating supply chain disruptions, detecting risks to brand reputation, and other impacts on organizational performance.
The world of national security has long been a man’s game. Not anymore. Primer’s Director of Global Intelligence Strategy, Cynthia Strand, spent 35 years in the CIA. At a recent panel for women in national security, Strand shared how success in national security comes when we all work together to break the gender bias.
Cynthia Strand, Director of Global Intelligence Strategy at Primer, alongside three incredible women who work in national security, recently spoke at a panel hosted by George Washington University (GW). The fireside chat was hosted by the GW College of Professional Studies, which offers a master’s degree program in cybersecurity strategy and information management and a bachelor’s and master’s degree programs in homeland security.
Strand was joined by Maria Berliner, managing director of the RTG-Red Team Group and a professor of intelligence and strategic analysis at GW, and Kathleen Haraseck, an adjunct professor of GW’s homeland security program. The discussion, moderated by Elaine Lammert, director of GW’s master’s program in homeland security, focused on building a career as a woman in national security, where panelists shared their experiences in the industry so that others might consider a job in such a cutting-edge and relevant industry.
Breaking the bias
Traditionally, national security has been a male-dominated industry. Not anymore.
Over the last several years, there has been a movement to improve gender diversity and fight unconscious bias in national security. President Biden’s cabinet is 44% women, the highest ever, and 50% of Senate-confirmed political appointments in national security have gone to women. But Biden’s National Security Council comprises just 36% women, according to the Leadership Council for Women In National Security (LCWINS).
Breaking the bias for traditionally male industries—or rather any industry in general—is hard.
That’s why this year’s International Women’s Day theme was to #BreakTheBias, encouraging everyone to work together to forge a path to women’s equality. This gives women in national security a place to thrive, knowing that their voices and points of view matter just as much as that of their colleagues.
Master your craft
For women considering a career in national security, Strand encouraged them to master their tradecraft. By being good at what you do, you create a seat for yourself at the table and a broad network of colleagues. The network you build makes you more effective and has a more significant impact on the organization. And women can only do that if they are given the option to maintain a shifting work-life balance as their personal lives evolve.
Don’t be afraid to get uncomfortable
Strand encourages women who are looking for a career in national security to stretch themselves, take risks, and apply for positions that they can grow into. That said, Strand also emphasized normalizing feeling uncomfortable in our roles and taking on new challenges and risks. If we’re not being challenged, we’re not learning. For Strand, one of the most significant challenges of her career has been navigating a job that fell outside a traditional path with a skill set that wasn’t always valued.
“We learn a lot when we’re uncomfortable,” Strand said.
Before joining Primer, Strand spent 35 years in the CIA as a former Deputy Assistant Director and Senior Manager in the Directorate of Science and Technology. She was also the Industry-Government Partnerships Innovator at In-Q-Tel.
Strand initially applied to be an analyst for the CIA but ended up as the Directorate of Science and Technology instead. She mentions that path as being “one of the best unanswered prayers of my career” as she was put into an environment where “we were encouraged to lean in and take risks.” She notes that the environment shaped the rest of her career.
Tools to succeed
Successful women also need to be at the helm of recruitment efforts, represented in a wide range of occupations to help female candidates see themselves and a career path they want to go down. Strand also recommended starting the mentoring process earlier by pairing new hires with senior female sponsors. She and the other panelists agreed that we all need to lift other women up, as we stand on the shoulders of the women who came before us.
“Wherever you can, lift other women up,” Strand said.
Good data is the first step to trusted Natural Language Processing (NLP). But preparing good data is a complex exercise that requires gathering data from disparate sources and unifying the data to unlock insights. In fact, Primer customers often spend more of their time creating datasets than they are building and leveraging NLP.
To remove this bottleneck, we are proud to announce we’re partnering with Trifacta to power Primer Ingest. Ingest is the critical first stage of our NLP platform that ingests and normalizes both structured and unstructured datasets from various application systems, databases, and data warehouses.
“Primer is pleased to partner with Trifacta – a cloud data engineering company. This partnership solves a foundational data ingestion pain point and provides connectivity to dozens of applications, making it easy for customers to bring data from their applications and data sources seamlessly into Primer’s NLP platform.“
– Pavan Venkatesh, Sr. Director of Product, Primer
“Trifacta is proud to partner with Primer, bringing our leading data preparation and integration capabilities to Primer’s industrial-grade NLP platform to empower mission-critical organizations to leverage the full potential of all their data.”
– Ron Papas, Sr. Director of Alliances, Trifacta by Alteryx
This data can then be used to generate critical insights, including:
- Identifying potential threats to an organization’s people, assets, and infrastructure, particularly during times of unrest and upheaval by looking at OSINT alongside internal documents
- Extracting insights from call notes, emails, call logs, and chats from applications like Zendesk to understand customer sentiment and identify customer experience opportunities
- Processing employee surveys, performance reviews, and pulse surveys to gain an edge in the most competitive talent market in history
- Leveraging customer surveys and in-market documents to perform a competitor analysis, analyze new markets, and develop strategies for product development and new product launches
- Automatically processing contracts to extract renewal and expiration dates, review and approve changes, or simply audit a corpus of contracts for specific information
To answer these questions, organizations need a data strategy that outlines the following:
- Where is the relevant data residing?
- Is it in systems such as databases, applications or files?
- Is it public information like social media or news?
- What type of data formats are being stored or retrieved?
- Is it structured (tabular)?
- Or is it unstructured (PDF, audio files etc.)?
- How can I get all this data into a platform, prepare, run ML models, and generate insights?
Because data must be effectively ingested so it can flow through different internal NLP components to make it usable and meaningful, Primer Ingest is a critical first step in leveraging NLP on your data.
Here’s where Ingest and Integrations fit into Primer’s Platform.
Primer Ingest capabilities
Primer Ingest provides the following capabilities:
- Data integration and transformation of your data
- Comes with pre-built ~ 60 connectors so you can bring in data from applications such as Salesforce, Zendesk, and others; Data warehouses such as Snowflake, Redshift, Teradata, and others; Data stores like S3; Local storage where the user can upload PDF, CSV, Excel files, and more
- No Code or Low Code ETL
- UI with workflow capabilities enabling our services team to easily connect and ingest structured data (tabular format) into Primer platform
- Multi-cloud support (AWS and Azure)
- Supports customer managed public cloud deployment
Here is the full list of integrations.
Primer simplifies getting good data by ingesting, cleaning, structuring, and enriching raw data from over 60 sources so you can quickly build and deploy NLP models, and then generate valuable insights to inform critical decisions.
If you’d like to chat with our team about your data and AI strategy, please reach out to us.
“We’ve all known for a long time that technology is changing the world we live in. Right now we’re seeing it play out in real time in Ukraine, the closest thing that we have ever witnessed to an open source war.” – Cynthia Strand, Global Intelligence Strategy, Primer
The Cipher Brief hosted an expert panel to discuss the application of AI to Open Source Intelligence (OSINT) and the challenges the Intelligence Community must overcome to accelerate the integration of AI to improve mission enablement and execution.
The panel featured Emily Harding, Deputy Director and Senior Fellow for the International Security Program at the Center for Strategic and International Studies (CSIS), and Cynthia Strand, Primer’s Global Intelligence Strategy Lead and former Deputy Assistant Director for Global Issues at CIA. The panel was moderated by Cipher Brief CEO and Publisher Suzanne Kelly.
Strand explained how OSINT and AI is being used in Ukraine. “Early on [in Ukraine] people were out there with Twitter and TikTok showing and reporting what they saw. This provides an incredible wealth of information for the intelligence community to harness and use. The challenge is it’s an incredible wealth of information. It is beyond any human being’s ability to comprehend. This is where AI, in the form of natural language processing specifically, brings tremendous value. Because AI can bring all of this information together – from mountains of chaotic data – to create structure and identify entities, events, and relationships among people.”
Watch: Preparing for Great Power Competition in AI and Intelligence, Cipher Brief, March 2, 2022
The conversation centered around the art of the possible in terms of how AI can be applied to the intelligence mission. Strand outlined two general domains where AI technology can be transformative:
- Mission execution: “We’re seeing this play out around the world, and especially in Ukraine with all the open source activity being used to organize and support Ukrainian resistance. Natural language processing (NLP) has a tremendous role in bringing together vast volumes of information and sifting through to find the signal amidst all of the noise. And not just for ‘truth’ but to detect disinformation and misinformation campaigns as well.”
- Mission enablement: “AI, in the form of NLP, brings value anywhere in the intelligence community where people deal with textual information. The greatest value it can bring is taking the rote tasks that eat up a lot of a person’s time and taking that off their plate. This raises the value of the work that people are doing, and lets the humans do what humans do best. Ultimately, this allows more resources to be diverted to mission execution.”
The conversation concludes with recommendations on how to facilitate AI adoption within the government environment. CSIS’s Emily Harding noted that the thinking needs to change. “The main key is to flip the risk paradigm on its head, to go from the risk of doing it to the risk of not doing it.”
Learn more about Primer for public sector organizations.
As information is increasingly weaponized for geopolitical gains, AI plays a crucial role in detecting disinformation, supporting de-escalation, and informing diplomacy in fast-changing environments such as Ukraine.
Fox Business News invited Mark Brunner, Primer’s head of Global Defense Strategy and a former U.S. Navy Commander, to join anchor Neil Cavuto in a discussion about Russia’s impending invasion of Ukraine.
Brunner asserted that Vladimir Putin’s invasion of Ukraine is “a blatant violation of international law … and a tragic development for the Ukrainian people who voted for democracy and freedom.”
Brunner also emphasized the role disinformation plays in this new era of information-based warfare, where AI-fueled propaganda is designed to create widespread confusion and distrust.
“[Putin] is conducting a master class in a disinformation campaign using a hybrid warfare version of old school armor and tanks complemented by cybersecurity threats, artificial intelligence, and Russian propaganda to essentially create confusion in Ukraine, the U.S., and among our allies. Candidly, we’re playing catch up here.”
It’s exceedingly difficult to “keep up with the narrative and the pace that Putin is setting for this conflict,” Brunner added.
Watch the full Fox Business News interview: “Putin Invasion is ‘Blatant Violation’ of International Law” (Feb. 22, 2022)
Indeed, processing, analyzing, and drawing actionable insights from immense volumes of rapidly changing data is extremely challenging. With finite time and limited human capital, it’s certain that significant amounts of vital information are often untapped or misunderstood.
Primer’s AI and machine learning solutions allow machines to do what they do well – empowering humans to focus on what they do best – making better decisions, faster.
Call to AI Action
The National Security Commission on Artificial Intelligence determined in March, 2021 that the U.S. is not prepared to defend itself in the AI era and must act quickly to increase its AI capabilities. Almost one year later, on Feb. 17, 2022, the U.S. Government Accountability Office (GAO) echoed those concerns, and found that the majority of the Pentagon’s advanced AI capabilities for warfighting are “still in development” and not yet fielded.
Primer recommends accelerated investment in AI technologies to equip the U.S. and its allies with the tools to combat this next generation of existential threats, including asymmetric disinformation and cyber campaigns, the erosion of democratic institutions and values, and increased strategic competition from our adversaries.
In high-stakes missions like these, front-line analysts, operators, and decision makers require customizable, high-performance AI models they can trust to allow them to act with agility and confidence on incoming information and achieve their mission objectives.
As the world continues to change rapidly, Primer is committed to partnering with organizations – across national security, civil services, financial, technology, and other critical industry sectors – to help them use AI to meet an ever-widening array of risk and security challenges and opportunities.
Whether that’s monitoring geopolitical events in real time, responding to cyber or physical security attacks, identifying disinformation campaigns, mitigating risk to brand reputation, predicting supply chain impact, tracking climate change effects, or allocating life-saving resources during natural disasters …
Primer automates the discovery and sharing of human knowledge with speed, precision, comprehension, and scale, putting the power of real-time information in the hands of change agents.
To learn more, contact Primer here.
AI can free us to be more human by automating daily tasks, especially in a world of information overload. Watch “Becoming Human: Are We Still Relevant in the Age of Artificial Intelligence?”
Instead of worrying that AI will take over your job, you might consider how AI can help you do your job even better. This video shows how natural language processing (NLP) can save you countless hours analyzing endless data streams so you can spend your time being more creative in solving problems that keep you (and your boss) up at night.
“Becoming Human,” a docuseries produced by Channel News Asia (CNA), an English-language news channel based in Singapore that reaches 29 territories across Asia and Australia, explores the impact of artificial intelligence on humanity. It features Primer in a new episode about the future of employment titled “Are We Still Relevant in the Age of Artificial Intelligence?”
Filmed at Primer’s HQ in downtown San Francisco, CNA depicts a “day in the life” of an information analyst. It shows how Primer’s NLP technology makes it faster and easier to sift through hundreds of thousands of data sources, summarize key points, and create reports for decision-makers who need timely, reliable insights.
The insights in these reports help people on the front lines make better decisions in high-stakes, fast-changing environments. These are the people responsible for monitoring global events in real time, identifying disinformation campaigns, anticipating supply chain or climate change effects, mitigating risks to brand reputation, managing customer services, responding to physical, financial, and cyber security threats, or allocating life-saving resources in the face of natural disasters.
Primer helps people address some of the world’s hardest problems by automating the analysis of massive amounts of unstructured data with greater speed, ease, and accuracy. Our technology transforms information overload into mission-critical intelligence so that you can drive smarter decisions and better outcomes.
A day in the life
CNA interviewed several Primer executives to get a better understanding of the challenges analysts face daily in their jobs, and also the opportunities AI presents.
According to Brian Raymond, Primer’s VP of Government and a former CIA intelligence officer, two things matter to the job of an analyst: how comprehensive you are and if you are timely with your report. “If you’re not on time, you’re irrelevant,” Raymond added. However, no matter how many human analysts an organization hires, the rate at which data is growing exponentially makes it impossible to meet either of these goals, let alone both of them. This is where AI comes in – to aid the analyst in comprehensively summarizing the key points on time.
How difficult is this level of analysis, really? CNA sat down with Primer to learn more.
In a fun but telling experiment, Primer’s Director of Science, John Bohannon, challenged CNA’s reporter to analyze a typical five-page news article and write an intelligence report highlighting the critical “five Ws” (who, what, where, why, and when), just as an analyst would. This “everyday” task took her approximately 30 minutes. For comparison, Primer’s NLP technology analyzed the same article and created a report within seconds. Not only did Primer’s AI tool make it much faster to summarize the key points, but it also expanded the type of information pulled – Primer’s NLP also extracted different pieces of information than CNA did, highlighting the complementary relationship between humans and machines in providing valuable insight at scale.
CNA also captured Primer CEO Sean Gourley’s perspective on the budding relationship between AI and human intelligence.
“Up until now, the only intelligence in the world that could read and write were humans,” Gourley said. “Now, we have another kind of intelligence: machines. Machines that can read and write for certain tasks just as well as humans but can do it a lot faster, with far less expense, and at a scale humans can barely imagine.”
The episode concludes with Gourley’s advice to anyone who remains skeptical about the productive nature of the relationship between humans and AI.
“AI isn’t going away. It’s got a distinct set of capabilities that are different from human intelligence, and it can do things that we humans can’t and that we’ll never be able to do because of the way we are wired biologically. Let machines be machines and humans be humans, but let’s work together to create a better kind of intelligence.”
Watch the full episode of “Becoming Human @ Work.”
Contact Primer here for more information about Primer and access product demos.
“Disinformation is one of the most significant challenges facing our country today.
How can we fight it? Let’s get to it, with Primer CEO Sean Gourley.”
Jason Stoughton, Host, Pulse of AI
In the second of a two-part interview series (here’s the first), Pulse of AI host Jason Stoughton invited Primer CEO Sean Gourley back to the podcast to discuss one of the most complex, disruptive – even existential – issues of our time: AI-powered disinformation.
In “Understanding, Identifying, and Combating Disinformation,” Stoughton cited Gourley’s deep experience, personal passion, and Primer’s work in this space as giving him a “unique perch” from which to view the full landscape.
Disinformation can “permeate politics, impact elections, and distort discussions,” Stoughton said in his introduction. AI can make it better, and AI can make it worse, he added. “So, the question is, how do we fight it?”
Enter Primer CEO Sean Gourley. He started with the concept of truth.
“One of the reasons disinformation is fast becoming an existential threat is that Western democracies are built upon a sense of shared ground truth. If you can’t agree on how the world is, you can’t run a process to make decisions about how the world should be,” Gourley said.
However, “absolute truth is a fallacy,” Gourley asserted, “since the whole scientific process involves discovering new things we didn’t previously know to be true.”
The manipulation of ideas, Gourley said, is a new, distinct threat. “It’s less important if an issue is true or not, and more important if people believe it to be true, or not.”
“Instead of trying to assess ‘what is truth,’ a more tractable and solvable problem is determining if there are active campaigns to influence the way people think,” Gourley continued. “That’s very much in line with how we at Primer think about bringing our technology to play in this game.”
AI’s Role: For better and worse
“We’re on the cusp of entering a world where machines are able to generate content that is indistinguishable from what humans could generate themselves, and they can do it very cheaply, at a scale that is superhuman,” Gourley said.
We’ve already seen this in deep fake videos, but we’re also now seeing this in the world of language.
“If you think it’s hard to determine if a photo has been generated by a machine, or not, it’s a whole lot harder to know if comments on a social media post have been generated by a bot or a human,” said Gourley.
“Technology exists today that can create hyper-personalized, computer-generated content to sway your opinions about a key issue in the world, or sow confusion for you about a key issue in the world. That is the technological advancement that has me most worried when we think about operating a coherent democracy,” Gourley said.
There’s an extreme asymmetry, Gourley added, between the speed, ease, and low cost of disseminating disinformation at scale, and the time and expense associated with traditional, often manual, efforts to assess and verify information emerging in real time.
While technology has created this asymmetry, machines can also help bridge the gap.
We can train machines to act as an early warning system for identifying disinformation campaigns by continuously monitoring the information landscape and automatically assessing novel claims — where they originate, who is making them, and how they are disseminated and amplified.
“We’re a long way from being able to actually detect disinformation campaigns early enough to start thinking about defense strategies,” according to Gourley. The critical first step, he believes, is early detection. “We need to detect the emergence of disinformation campaigns at a time when we can actually react and respond. This kind of early detection and defense requires a long-term investment in technology and infrastructure. Primer is proud to be working on this.”
“We have to get very serious about investing in technology that can support this mission of early detection and enable us to react in time to be effective.”
Sean Gourley, CEO, Primer
Gourley discussed a number of related issues throughout the interview, including the underestimated risks of disinformation on commercial markets.
It’s increasingly clear that we need to pay significantly more attention to protecting essential drivers of the global economy, such as financial system stability, corporate reputation management, and employee/stakeholder confidence.
Gourley also believes that citizens in democratic societies have a responsibility to maintain “healthy information diets.”
“If we’re successful with that, it’s going to be a lot harder for disinformation attacks to achieve their objectives,” Gourley concluded.
Listen to the full Pulse of AI podcast here: Understanding, Identifying, and Combating AI Powered Disinformation
“When you’re working on something highly complex, technical, and novel, you need people with different perspectives and different kinds of skill sets.” Amy Heineike, VP of Engineering, Primer
The Technical Women podcast selected Primer’s Amy Heineike, VP of Engineering, as the very first guest for its inaugural episode. In “Growing a Career in NLP,” Heineike discusses her career trajectory in Natural Language Processing (NLP), one of the fastest growing fields in the technology industry.
Technical Women highlights women with technical expertise and proven experience who are trailblazing, inspiring, and building our future, with a focus on modern computing and machine intelligence.
Heineike describes her journey into NLP, what she has learned about transforming NLP technology into practical solutions people can use to create value, and how she approaches building a world-class team in a highly competitive market for specialized talent.
Learn how Heineike’s insatiable curiosity set her on a path to be part of the founding team of a leading NLP company: https://amplifypartners.com/company-building/amy-heineke/
Bonus: you can try first hand some of the NLP functionality Heineike discussed in the interview, including the abstractive summarization engine, which can generate punchy summaries of documents without using the words in the original source material. What? Just sign up for a free trial and give it a whirl.
For more information and product demos, contact Primer here.
“One of the most exciting companies in the space.”– Jason Stoughton, Pulse of AI
The Pulse of AI podcast highlights hot AI start-ups and “the smartest people in the world,” according to host Jason Stoughton – especially when it comes to successfully deploying and leveraging data and AI in the enterprise.
Primer CEO Sean Gourley was featured in a two-part interview covering a wide range of topics, including “why now” for NLP technologies, Primer’s emerging leadership in this fast-growing market, and the implications for organizations adopting NLP.
They also discussed the role AI can play in combating disinformation – AI’s limitations and its potential. In fact, disinformation is such a complex and existential issue that it became the focus of Gourley’s second interview, Understanding, Identifying, and Combating AI Powered Disinformation.
In the first episode, Conversation with Sean Gourley, Founder and CEO of Primer, Stoughton kicked off the discussion saying, “NLP is and always has been one of the most exciting areas within the field of AI.” He added, “Business leaders are just now beginning to see the potential that NLP represents… as companies like Primer begin to release powerful yet easy-to-use tools that bring NLP out of the research department and into the hands of business leaders.” Stoughton continued, “Primer recently released a new product line called Primer Engines, which are pre-trained engines that can be layered over your data, allowing you to do the types of things that just a few years ago were plot lines in Hollywood movies. And just as importantly, or maybe even more so, they are designed so that even non-technical business users can easily use them. What used to take months and a significant amount of investment in terms of people and money can now be done in a day. Literally. All of this makes his company, in my opinion, one of the most exciting players in the space.”
From there, Stoughton and Gourley discussed a broad range of topics, including:
- Gourley’s inspiration for starting Primer
- The purpose of NLP, why it’s one of the most difficult challenges in computer science, and recent advancements
- Why forward-looking organizations are assessing what NLP means for their business operations
- How pre-trained Primer Engines – that are interoperable, scalable, customizable, and easy to use – are accelerating new ways of interacting with the world’s most valuable information and creating new kinds of intelligence
“The organizations that will be most successful are going to be the ones that appreciate the strengths of machines and the strengths of humans – and combine them into the best business processes.”Sean Gourley
“With these new intelligence capabilities spurred by NLP, how would you design your business systems to take full advantage? The organizations that will be most successful are going to be the ones that appreciate the strengths of machines and the strengths of humans – and combine them into the best business processes. Organizations that take that step first will have the most experience getting their heads around what the technology can do. They are going to create the best processes and, ultimately, will be the ones that win,” said Gourley.
Noting Primer’s successful growth trajectory, Stoughton asked Gourley how Primer is effective in attracting world-class talent. “For us, the pitch is very simple,” Gourley explained. “If you want to work in the world of bleeding-edge technology, building machines that can read and write at super scale and with human-level precision, this is the best place in the world to do that.”
“Machines are going to get increasingly sophisticated at reading and writing. Everything that involves text within your organization is going to change in the next five years.”Sean Gourley
And Gourley’s final words: “Machines are going to get increasingly sophisticated at reading and writing. Everything that involves text within your organization is going to change in the next five years,” said Gourley. “Natural language processing is going to change the way you interact with information. You can get cost savings immediately. So it’s a good time to start thinking about what NLP might do for you today. Get started before it’s too late.”
Listen to the full interview here: The Pulse of AI: Conversation with Sean Gourley, Founder and CEO of Primer (Episode 100). The second interview on disinformation – “one of the most significant challenges facing our country today,” according to Stoughton – is available here: The Pulse of AI: Understanding and Combating AI Powered Disinformation (Episode 101).
Here’s a smart look at the burgeoning role of AI in the fight against disinformation.
Principal analyst Mark Beccue in OMDIA’s AI practice spoke with Primer CEO Sean Gourley for an article called Can AI Fight Fake News?, published in AI Business on September 9, 2021.
Starting with a provocative quote about “truth” from George Orwell’s 1984, the article explores the current state of natural language processing (NLP) solutions to discern truth or untruth by analyzing massive amounts of data rapidly – far beyond the scope and scale of human ability.
Is AI suited to help fight against misinformation and fake news, and if so, in what way?
According to Sean Gourley:
- For processing high volumes of text-based data at speed, AI is 2000x better than human analysis.
- But the challenge is defining misinformation – what is real and what is not real. Machines can’t get to absolute truth.
- AI can be leveraged to detect an active campaign to push out misinformation
- Organizations are interested in tools that can do that as an early warning system
Is fighting misinformation/fake news a monetization opportunity for companies such as yours?
- For us it certainly is. Our technology is being used today to detect emerging information warfare campaigns by analyzing where claims originate and how they are disseminated through the media.
- Primer with the U.S. military is building an AI platform that will be able to automatically identify and assess suspected disinformation. The solution will be used by the Air Force and Special Operations Command.
- The system continuously collects a massive amount of broad data. It looks for misinformation claims that have been made, and then identifies claim attribution – who are making these claims. Finally, it analyzes counterclaims.
- All of these elements collectively detect a potential issue which security analysts can then use much more quickly to strategize against threats.
- It’s very early days in the commercial space for misinformation use cases but monitoring meme stocks and stock manipulation makes sense. There is a lot at stake for the investment ecosystem to keep that clean.
As Beccue concludes, “the fight against fake news will take a suite of weapons.” Learn about Primer’s toolkit.
Read the full article here: Can AI Fight Fake News?
A financial technology company compared Primer’s natural language processing (NLP) models against multiple competitors. Out of the box, and without any retraining, Primer outscored them across all of the key performance metrics.
Recently, a financial technology company conducted an evaluation of multiple intelligence solutions on the market, including Primer, that feature artificial intelligence (AI) and natural language processing (NLP) technologies. Some of the solutions were from the largest and most established cloud companies in the space, while others were from smaller companies and open source NLP libraries. Prior to choosing a solution, the company wanted to determine which technology would best fit their need to accurately identify company names in financial filings.
To conduct the evaluation, the financial company created a custom, hand-labeled named entity recognition (NER) dataset internally to evaluate the NER predictions from each provider’s solution. NER is the machine learning task of identifying and categorizing entities, such as people, locations, and organizations, that exist in unstructured text. It is a foundational and complex technology in NLP, which is why many companies choose NER as their testing and evaluation framework. If a provider performs well at NER tasks, then they will likely be able to build more complex models on top of it. For example, an NER model can be trained to automatically distinguish Apple, the company, from apple, the fruit, and it can also associate Tim Cook, the person, with an Apple computer.
There are so many ways that organizations can use NER models. Financial firms might want to deploy NER on press releases to identify newly announced executive members of a company of interest, discover new competitors named explicitly in financial filings, or understand company exposure to specific regions and locations based on their mentions in an earnings transcript. Likewise, a NER model can be trained and deployed to correctly differentiate between Amazon the company and Amazon the river. This difference is important for trading models derived from this information. If we are looking at sentiment, for example, then it is important that the quantitative trading model is actually looking at sentiment relating to Amazon the company and not the river.
A financial firm might also look to NER to categorize key information and identify specific details in SEC filings for a group of companies. For example, in looking at a financial filing, for example, the SEC filing for Coinbase Global, Inc. a NER model can identify key entities of interest.
In these sentences, Primer’s NER Engine can quickly identify several entities: Mr. Armstrong, Airbnb, Inc. Universitytutor.com, and Johnson Educational Technologies LLC. It can also categorize them correctly: Airbnb, Inc. Universitytutor.com, and Johnson Educational Technologies LLC are organizations and Mr. Armstrong is a person. Analysts can search based on the relationship between specific entities and locations, and co-mentions.
A big part of a financial analyst’s job is to gather data from various documents and then organize it, clean it up, and get it into a format it can be made sense of. Primer’s NER Engine can quickly and accurately identify, structure, and extract the data for analysts. This is a true paradigm shift for analysts: instead of performing manual qualitative analysis, they can automate the structuring of the text to feed directly into their quant trading models. This allows the analyst more time to do what they are best at — analyzing the information, looking for hidden trends, and much more.
Primer won by a large margin
In the test, the customer found that Primer scored better on all the industry standard measures and performed even better on the financial company’s top priority measures. Primer scored an average of 21% better than the leading competitor and an average of 42% better than its competitors across all metrics. As a result, the financial firm became one of our newest customers.
The following table illustrates how Primer stood above the competitors in nearly every category.
It is important to note that the financial company was evaluating Primer’s out-of-the-box NER Engine, which has been trained on general text data and not specifically on the types of financial data the customer was testing against.
Primer’s advantage lies in data source diversity
To get to this level of algorithm precision with our NER Engine, Primer engineers injected diversity into the data used to train our engines on a range of writing styles, subject matters, and entities. We also curated a highly diverse group of documents, including entities from the financial, defense-related, and scientific worlds. This diversity was carefully curated over iterative testing and training cycles, and it is what enables the Primer NLP Engines to outperform our competitors.
Companies can further improve model performance on these types of tasks by using Primer Automate, our no-code end-to-end machine intelligence platform that enables users to quickly and easily retrain the NER model using domain-specific data. The Primer NER Engine is one of our 18 best-in-class NLP Engines that customers can access via API or from within Automate to structure and conduct advanced workflows on their documents.
Primer’s NER Engine is shown to be strongest in both precision and recall
Using Primer’s NER Engine, the financial firm tested the output quality using industry-standard metrics: precision, recall, and an average of the two (the “f1” score1). Among NLP machine learning models, there is typically a trade-off between precision and recall. For a high-frequency trading algorithm, a high precision model is of paramount importance for automated trading workflows. After all, you don’t want to mistakenly trade on negative sentiment about Amazon (the company) when a financial analyst’s note is talking about negative consequences of the Amazon (the rainforest) being cut down for agriculture. A misclassification of entity types in trading models like this can rapidly become incredibly costly.
Conversely, there are examples where the goal of the model is to achieve a high recall score. For example, a business analyst looking to mitigate the risk of forced labor in her company’s global supply chain, cannot afford to miss any related violations in volumes of audits. This analyst will want a high recall model that casts a large and inclusive net to catch all relevant mentions, with manual review an expected part of her workflow. Compared to other commercial solutions, Primer’s NER Engine outperforms the industry benchmarks in both precision and recall, as our customers have proven out in the test scores detailed in the chart above.
Primer also scored higher in other key measures
The financial company not only used precision, recall, and f1 scores to determine the quality of output from Primer and our competitors, they also took the scoring a step further and broke down the measurements by five categories of interest. These measures show different ways of assigning scores to the predictions from each NER model. One of the categories that the customer cared about the most was “Type Penalty.” For this measure, the prediction receives a perfect score if the type of the named entity is correct. It receives a partial score if the type is correct and the boundaries are not an exact match. In looking at the example below, this prediction would receive a positive score because it correctly identifies part of the boundary of Primer Technologies, and the type prediction is correct. A perfect score for this measure would be assigned to the prediction where “Primer Technologies” is identified as an ORG; this would be the only prediction that receives a perfect score under this scoring method.
The chart below further illustrates how Primer scored 21% better than the leading competitor on this metric.
Primer also performed better on the four other measures that the financial company evaluated:
Boundary exactness: The NER categorization receives a perfect score if the named entity’s boundaries, or where the sequence of words starts and stops, are identical to the gold label data, regardless of whether the type is correct or incorrect. The prediction in the example below would receive a perfect score, even though the category type “PERSON” is incorrect.
Partial boundary: The prediction receives a perfect score if the named entity’s boundaries are identical to the gold label data. It receives a partial score if the boundaries are imperfectly matched, regardless of whether the type is correct or incorrect. The prediction in the example below would receive a “partial” or fractional score because it correctly predicts part of the ORG Primer Technologies boundary.
Recently, Primer was named one of the Best Midsize Companies to Work For in the Bay Area. This was our first time making it on this list, and I’m especially proud because it happened in 2020 – a year when we needed to push outside our comfort zone and think innovatively about our culture.
As I reflect on how we got here, a few things stand out as being particularly impactful. Sudden changes in context forced us to reimagine every aspect of our culture, and redesign it with intention. Here are five things we got right in 2020:
We embraced change and looked for opportunities.
- We saw COVID early, so we were prepared and the transition to remote work was relatively seamless. We made sure our employees were set up to be productive and successful by offering them an allowance for home office set-up, and a $75 monthly stipend throughout 2020.
- We asked ourselves “How can Primer become an engine of job growth in 2020?” Looking for the opportunity in a dark moment changed the conversation. We didn’t want to simply survive; we wanted to thrive. We decided we’d fight to win and oriented the team toward this goal. Following early successes, we fulfilled our goal of becoming an engine of job growth and onboarded more than 50 people in the latter half of 2020.
- We got creative about designing cultural moments that worked even better in a virtual world. We trained the whole company simultaneously on Unconscious Bias and Coaching by video conference. We also held a virtual Customer Day with inspiring panelists like Tony Thomas, Sue Gordon, Dash Jamieson, and Kim Kagan. It would have been expensive and logistically impossible for a small company like Primer to host events like these in person.
We doubled down on our mission.
- Primer creates the tools behind the decisions that change the world. Every day, our users are busy defending democracy, looking for cures to COVID-19, and guiding companies with our technology. When we’re hiring, we screen for people who are passionate about this mission, and the result is a culture of grit, purpose, and excitement about the future.
- We used our technology to help the world in a crisis. When COVID-19 struck, our team asked, “How can we help?” Over one weekend, a small group of engineers built covid19.primer.ai to help doctors, scientists and researchers keep on top of the rapidly proliferating research on COVID. It was intensely gratifying when we received thank you notes from the people on the front lines, saving lives.
We cared for our people.
- While the country was in lockdown, Primates made the best of it by working around the clock to build and sell our products. There was a lot of energy and progress, but no one was taking time to disconnect or renew. In addition, we were carrying a heavy emotional burden worrying about economic and health security, vulnerable relatives, systemic racism, loneliness, homeschooling, and wildfires. Work was a welcome distraction from the world around us, but the pace and intensity was unsustainable. The ‘new normal’ had created the perfect conditions for burnout. We decided we needed to disrupt the pattern. In May, we announced a company-wide Mental Health day to encourage self-care. Everyone turned off at the same time on the same day, and we encouraged them to share photos of how they found joy. It was so successful that we scheduled monthly Mental Health Days for the rest of 2020. We shared the benefits of our approach with other companies, and saw others follow our lead and adopt the same practice.
- We adapted our routines to accommodate employees facing the biggest challenges. For example, parents were struggling to manage homeschooling while working 9am-5pm. We used a company’s all hands to highlight the problem, and our CEO Sean Gourley set the expectation that it was on all of us to help each other by adjusting our usual schedules to accommodate homeschooling duties, giving colleagues space and comfort when their kids walked in on calls, and covering for parents when they were double-booked. This ‘permission’ changed the cultural dynamic and added a new lightness.
- We found virtual ways to celebrate diversity. We hosted impactful employee-led events to build education and awareness around Pride, the history of racial injustice in America, and Native American Heritage Month. We highlighted important muti-cultural holidays at all hands and celebrated our differences.
We articulated how to be successful
- After adapting our strategy to COVID, we quickly reprioritized and communicated new company OKRs. By articulating what success was going to look like in 2020, we freed our team to take charge of their own destiny.
- With so much uncertainty, we wanted our team to know we saw a future for them at Primer. We invested in building out career ladders and training every manager to set growth goals for their people to give them a path for advancement. This provided structure during uncertain times.
We connected and listened to each other in new ways
- Without in-person touchpoints, we needed technology to connect our people and business goals. We have a real-time pulse on how people are feeling and working.
- We implemented Lattice to track OKRs, run employee pulse surveys, and manage performance.
- We connected socially through fun Slack pop-up channels like #thejoyofcooking, #lets-socialize, and #petsofprimer to share glimpses into our personal lives from afar, and Donut to pair up randomly for getting to know you chats.
- We used Clockwise to boost productivity and optimize our calendars.
- Zenefits helped us do virtual onboarding seamlessly, and structure the first 90 days to ensure every manager and new hire are having the right check-in conversations at the right time.
We ended the year by giving our colleagues a holiday gift that was symbolic of everything we’d been through together – a Lego International Spacestation kit to build with their loved ones. Puzzles are a quirky and fun part of our culture, and we wanted to give them a puzzle that forced them to get off their screens, build something with their hands, and find joy with the people who supported their careers at Primer all year. We’re proud we built a great place to work during the strangest of times, and we’re sharing what we got right in hopes other companies can benefit.
If you want to join us and continue to evolve our culture, we’re hiring!
Science is one of humanity’s most important weapons in the fight against COVID-19. It is through science that we will understand how COVID-19 spreads through a population, how it interacts with the cells in our lungs, and how it jumped from animals to humans. It is also through science that we will create vaccines and treatments to stop this pandemic. Since the virus was first observed in China in December 2019, there have been over 2,500 articles written by over 8,000 authors in PubMed, arXiv, MedRxiv, and BioRxiv. In total, we estimate that this research represents hundreds of thousands of hours of work from some of the brightest minds in the world.
This body of research is remarkable, but the sheer volume of information makes it impossible for anyone to read it all, let alone for researchers to track progress day after day.
Ten days ago, I was brainstorming with our team at Primer about what we could do to support this scientific effort and we immediately saw an opportunity to take our internal tool for analyzing arXiv machine learning papers and apply it to scientific research on COVID-19. We spun up a small team who has been working around the clock to build this public resource we are releasing today: COVID19primer.com.
This one-stop website tracks all the latest COVID-19 scientific research, as well as news coverage and social media discussions of this research. We created this as a resource to help all researchers, scientists, policymakers, and journalists get clarity into the science as it unfolds in real-time. Covid19Primer.com automatically updates every 24 hours at 8am GMT with all of the new published research included in the analysis, generating a one-page daily briefing. It also generates a weekly briefing email that anyone can sign up for.
The website ingests all of the scientific papers about COVID-19 and analyzes it overnight. Our software classifies the papers into the top-level COVID-19 research categories prioritized by the White House call to action to the AI community. It extracts out all of the key authors and their affiliations (so you can look for research done by researchers at John Hopkins University, Chinese or British institutions, etc.), and extracts and defines the thousands of jargon terms that exist in the literature. This gives everyone a quick look-up guide to see, for example, that ARBs actually means Angiotensin Receptor Blockers, and all the associated papers that discuss this.
In addition to the top-level research categories we have also implemented a bottom-up method for finding emerging topics in the research, and to see which researchers are driving those topics forward. An increasing share of this research is in preprint form, so we have included both mainstream news media and social media content that links to these scientific papers in order to surface the online discussions about this latest research. If you don’t have time to engage directly with the scientific papers you can go to the newsfeed and read all the stories that talk about the new COVID-19 research.
The website is currently tracking all COVID-19 research going back to December when the first paper about the novel coronavirus emerged in Wuhan, China, including:
- Over 8,000 scientific authors, along with all of their quotes that have been published in news articles
- Over 2,000 COVID-19 research papers in PubMed, arXiv, bioRxiv, and medRxiv, with hundreds of new papers published daily
- An auto-organized view of research progress on the top-level COVID-19 research categories prioritized by the White House call to action to the AI community
- Emerging research topics, for example Cell Epitopes & Peptide-HLA, that are automatically detected and extracted
- Over 200,000 tweets in which COVID-19 research is shared and discussed, which like these research papers are growing exponentially
- An auto-generated glossary of over 1000 technical terms, growing daily
Here’s how you can help:
Share this website with your friends. The more people we have engaging with the scientific research around COVID-19 the better decisions the scientific community will make. There is a lot of noise out there about COVID-19—science gives us a much more grounded place to start.
Help us tag the data. The next step we’re undertaking is a large-scale tagging project to train machine learning models to identify key findings from the data. Sign up if you have experience with any of these research areas and want to help us tag data.
Help create Wikipedia pages. Many of these researchers don’t have Wikipedia pages. If you see a prominent researcher here without a page, it would be great if you could help create a page for them.
Please email us with feedback. If you see any mistakes in the classifiers or language generation, please email us. We will be bringing in UI elements to let users provide feedback directly from the product. Also, if there’s a news site or blog that is providing good coverage of this research and we don’t have it, please send it to us and we can update our data sources.