Language Agonistic Multilingual Sentence Embedding Models

Sentence embeddings have enabled us to compare semantics of sentences numerically, which are now essential for tasks such as semantic textual similarity, semantic search and sentence clustering. Unlike keyword based search, which retrieves lexically similar contexts but not necessarily what you are looking for, semantic search retrieves only semantically relevant sentences to your query. For example, with a query “Thousands demanding climate change action”, we can retrieve a sentence “Copenhagen: protests against global warming” with semantic search, but not with keyword search.

In particular, a rapid development of transformer based multilingual sentence embedding models over the past year now enables us to handle semantics of sentences across multiple languages with just one model. This can be done without needing to translate sentences, which risks distorting the original meaning with bad translation and is computationally expensive.

So how well can these models identify semantic similarities of sentences, regardless of languages, i.e., being language agonistic? How well can they retrieve the most relevant document to a query, from a pool of multiple different languages? Some models were trained with cross-lingual translation pairs and are only intended to be used for translation. Thus little study has been done on investigating cross-lingual semantic textual similarity on semantically similar cross-lingual sentence pairs (instead of translation pairs, which are supposed to be semantically same).

Primer ingests vast amounts of documents daily, and it is important that our systems can retrieve semantically similar documents accurately across multiple languages on news, social media, or companies’ internal documents.

Here, we conducted a detailed evaluation of publicly available multilingual sentence embedding models by measuring semantic similarity of news titles in 33 languages, and by visualizing the embeddings spaces. 

Getting similar news from a pool of news contents in 30+ languages

A truly language-agonistic multilingual language model is one where all semantically similar sentences are closer than all dissimilar sentences, regardless of their language.

Examples of known multilingual sentence embedding models which were trained on a large number of languages are,  LaBSE(109 languages) [1], multilingual SBERT(50+ languages)[2,3], and LASER3 (200 languages)[4]. Do these models perform well on retrieving semantically similar sentences from a pool of documents with 10s of different languages?

Here we investigate the multilingual sentence embedding models on their ability to identify semantically similar (but not exactly same) sentences by taking a look at news titles in 33 languages. 15,210 multilingual news titles were scraped from all news articles that have links to English WikiNews in non-English languages. A list of languages in the dataset is, English, French, German, Portuguese, Polish, Italian, Chinese, Russian, Japanese, Dutch, Swedish, Tamil, Serbian, Czech, Catalan, Hebrew, Turkish, Finish, Esperanto, Greek, Hungarian, Ukrainian, Norwegian, Arabic, Persian, Korean, Romanian, Bulgarian, Bosnian, Limburgish, Albanian and Thai. Then, sentence similarity of the English news title and the foreign news title of the same news (positive pairs), as well as of the news which has no common categories (negative pairs) were calculated. 

For example, a WikiNews article titled “United Kingdom buries Queen Elizabeth II after state funeral” has linked articles in 11 other languages. Their titles are shown below.

On the other hand, a WikiNews article titled “Very serious’: Chinese government releases corruption report”, which has no overlapping topics with the news above, has linked non-English news articles with following titles.

Since there are no common topics between these two news events, their titles should be dissimilar to each other regardless of the languages. For example, the following can be regarded as positive and negative sentence pairs for English – French news title pairs.

Positive English – target language sentence pairs were created from all English WikiNews pages that have international news pages linked to them, and negative English – target language sentence pairs were created from all possible sets of news articles that have no overlapping topics. The following shows the distribution of cosine similarity scores of positive and negative title pairs, grouped by languages. A box indicates the interquartile range of the distributions. Similarities were calculated using one of the three multilingual sentence embeddings SBERT(distiluse-base-multilingual-cased-v1), SBERT(paraphrase-multilingual-mpnet-base-v2), and LASER3.

(a) SBERT distiluse-base-multilingual-cased-v1

(b) SBERT paraphrase-multilingual-mpnet-base-v2

(c) LASER3

Fig. Distribution of cosine similarity scores of positive ( cross-lingual pairs of same news) and negative (cross-lingual pairs of unrelated news) title pairs, grouped by languages

SBERT paraphrase-multilingual-mpnet-base-v2 model and LASER3 model have similar cosine similarity scores across all languages, except Tamil, Limburgish, and Thai in SBERT paraphrase-multilingual-mpnet-base-v2 model. On the other hand, on SBERT distiluse-base-multilingual-cased-v, average cosine similarity of positive sentence pairs varies widely depending on languages, from ~0.8 in Portuguese to ~0.2 in Tamil. Due to the language bias, a sentence retrieval model built with this embedding model could rank Portuguese sentences that are not that similar to an English query much higher than a Hebrew sentence which has the exact same meaning as the query.

LASER3 gives higher cosine similarity scores for positive pairs (average 0.7~0.8), but also for negative pairs (average ~0.55, in contrast to average 0.05 for SBERT). Even though LASER3 was trained on 200 languages including all 32 foreign languages that are on our evaluation datasets, they struggle to distinguish between similar news titles and dissimilar news titles on some English-foreign language (e.g., Thai) title pairs. We can conclude that SBERT(paraphrase-multilingual-mpnet-base-v2) is the best of the three models discussed here for the multilingual sentence similarity search task, since the differences between the cosine similarities of positive sentence pairs and the negative sentence pairs are the largest on average. This result shows that it is important to know if your model has a language bias in languages of your interest.

Note here that positive sentence pairs used here are not exactly semantically same, as you see in the example positive pairs shown above (e.g..,  a positive pair “United Kingdom buries Queen Elizabeth II after state funeral”

 and ”大不列顛及北愛爾蘭聯合王國女王伊麗莎白二世陛下逝世,享耆壽96歲 (translated: Her Majesty Queen Elizabeth II of the United Kingdom of Great Britain and Northern Ireland dies at 96)”).  Thus, we don’t expect cosine similarities of the positive pairs to have the value exactly, or very close to 1. 

Visualization of the Distribution of Sentence Embeddings By News Topics

To further understand how embeddings of news titles are distributed in the multilingual semantic embedding spaces, we visualized them in 2 dimensions. A figure below shows the distributions of news titles embedded with the SBERT(paraphrase-multilingual-mpnet-base-v2) model. The dimension of the embedding space was reduced to 2D using a dimensionality reduction technique called t-SNE, which preserves local structure of the clustering.

Sentence embeddings are colored 13 news topics defined by the WikiNews: Crime and law, Culture and entertainment, Disasters and accidents, Economy and business, Education, Environment, Heath, Obituaries, Politics and conflicts, Science and technology, Sports, Wackynews, Weather. Here, I excluded news titles which have more than one of the 13 topics.

Fig. SBERT (paraphrase-multilingual-mpnet-base-v2) embeddings of WikiNews titles (34 languages) with its dimension reduced to 2D with t-SNE method

We can see the embeddings of multilingual news titles clustered together by news topics, indicating that our embedding space contains meaningful information about the topics seen in the news.

Fast Search on Multilingual Corpora

Here we showed that multilingual sentence embedding models are potentially powerful tools, and it is important to understand the language bias when using them for multilingual semantic search tasks.

Semantic search using a multilingual embedding model gives us great advantage in many ways. Compared against first translating documents and then using the Okapi BM25 algorithm, which is a well known bag-of-words retrieval function, semantic search using multilingual dense embedding models enabled us to retrieve news articles of the same events with higher precision and recall, and more relevant news, without ever worrying about the language of the text. Furthermore, computing dense embeddings is much faster than translating sentences in general, and we find that pre-computing time for this semantic search using SBERT  (paraphrase-multilingual-mpnet-base-v2) model was more than 100 times faster than the keyword based model using the light translation model nllb-200-distilled-600M. These trends apply to a wide range of document types beyond news, from short social media posts, to companies’ internal documents with varying text lengths and domains. At Primer, we constantly seek the best solution to retrieve, cluster and understand documents efficiently and accurately, and those documents are not limited to English, but any texts that exist in the world.

Reference

[1] Language-agnostic BERT Sentence Embedding (Feng et al., ACL 2022)

[2] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (Reimers & Gurevych, EMNLP-IJCNLP 2019)

[3] Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation (Reimers & Gurevych, EMNLP 2020)

[4] No Language Left Behind: Scaling Human-Centered Machine Translation (NLLB Team, arXiv:2207.04672, 2022)

Among the largest and most important AI conferences in the world, The AI Summit London began in 2015 when research and academia were the focus. Just seven years later, it has evolved into the industry’s foremost event, focused on the practical applications of AI for enterprise organizations and real-world solutions that are transforming business productivity. Given Primer’s growth and recent acquisitions, the time was right in 2022 to make an appearance.

Paul Vingoe, head of operations in Europe and the Middle East, represented Primer at the conference. AI Business TV reporter Ben Wodecki interviewed Vingoe to learn more about Primer and its industry-leading machine learning and natural language processing solutions for government agencies and Fortune 1000 companies.

Watch Meet the Innovators: Primer, by AI Business TV

Practical applications of NLP

Vingoe explained that while still a nascent technology, NLP is catching on quickly and users are becoming more confident in its abilities to identify relevant content from unstructured text to make better decisions. Vingoe explained how the legal sector is using AI to surface key details from unimaginable volumes of documents, contracts, license agreements, insurance policies, and other legal documents.

“It brings information to the human who can apply human judgment far quicker than they can find it by reading the documentation themselves,” Vingoe explained.

During the interview, Primer Command was processing live news media and social media, seeking out data on the Russia-Ukraine war.

“What is really powerful is in real time, you can see sentiment, you can see individuals have been named, you can see locations and organizations, you can bring that information together. What you’re seeing is the ability to, as a user, very quickly understand what’s happening in real time in a region,” Vingoe added.

The perfect example 

During the conference, the UK released their National AI strategy. This presented an opportunity to demonstrate the power of NLP. The original document, interviewer Ben Wodecki admitted, took him about three hours to read.

“Within half an hour of it being released, we summarized that 40-page document into 1,600 words and 91% compression. And then we summarized it down to one paragraph, and it’s spookily accurate,” Vingoe explained.

What’s next?

 In response to a final query on what Europe can expect from Primer in the coming year, Vingoe talked about Primer’s solutions and how they will evolve. He cited open source intelligence, tools that train models, and additions to Primer’s AI portfolio.

“We have a product called LightTag, which is a German company we’ve acquired,” he said about the data labeling solution for training models that Primer uses and then acquired in February 2022.

“We also have a product called Yonder, which does information analysis for brand management and other use cases,” he added, regarding Primer’s pioneering information manipulation analysis software. “The aim is to have a natural language processing environment that will allow our customers to access our models, and use the output of those models in their own applications in their own workflows, in their own environments.”

View the entire interview here.

For more information on Primer products, visit primer.ai/products.

 

The world of national security has long been a man’s game. Not anymore. Primer’s Director of Global Intelligence Strategy, Cynthia Strand, spent 35 years in the CIA. At a recent panel for women in national security, Strand shared how success in national security comes when we all work together to break the gender bias.

“National security is a team sport. No one organization is going to save the country. But we can all do it together.”

Cynthia Strand

Cynthia Strand, Director of Global Intelligence Strategy at Primer, alongside three incredible women who work in national security, recently spoke at a panel hosted by George Washington University (GW). The fireside chat was hosted by the GW College of Professional Studies, which offers a master’s degree program in cybersecurity strategy and information management and a bachelor’s and master’s degree programs in homeland security. 

Strand was joined by Maria Berliner, managing director of the RTG-Red Team Group and a professor of intelligence and strategic analysis at GW, and Kathleen Haraseck, an adjunct professor of GW’s homeland security program. The discussion, moderated by Elaine Lammert, director of GW’s master’s program in homeland security, focused on building a career as a woman in national security, where panelists shared their experiences in the industry so that others might consider a job in such a cutting-edge and relevant industry. 

Breaking the bias

Traditionally, national security has been a male-dominated industry. Not anymore.

Over the last several years, there has been a movement to improve gender diversity and fight unconscious bias in national security. President Biden’s cabinet is 44% women, the highest ever, and 50% of Senate-confirmed political appointments in national security have gone to women. But Biden’s National Security Council comprises just 36% women, according to the Leadership Council for Women In National Security (LCWINS).

Breaking the bias for traditionally male industries—or rather any industry in general—is hard. 

That’s why this year’s International Women’s Day theme was to #BreakTheBias, encouraging everyone to work together to forge a path to women’s equality. This gives women in national security a place to thrive, knowing that their voices and points of view matter just as much as that of their colleagues.

Master your craft

For women considering a career in national security, Strand encouraged them to master their tradecraft. By being good at what you do, you create a seat for yourself at the table and a broad network of colleagues. The network you build makes you more effective and has a more significant impact on the organization. And women can only do that if they are given the option to maintain a shifting work-life balance as their personal lives evolve.

“The most important thing is to master your craft,” Strand said. “No matter what you are hired to do, do it to the absolute best of your ability.”

Don’t be afraid to get uncomfortable

Strand encourages women who are looking for a career in national security to stretch themselves, take risks, and apply for positions that they can grow into. That said, Strand also emphasized normalizing feeling uncomfortable in our roles and taking on new challenges and risks. If we’re not being challenged, we’re not learning. For Strand, one of the most significant challenges of her career has been navigating a job that fell outside a traditional path with a skill set that wasn’t always valued. 

“We learn a lot when we’re uncomfortable,” Strand said.

Before joining Primer, Strand spent 35 years in the CIA as a former Deputy Assistant Director and Senior Manager in the Directorate of Science and Technology. She was also the Industry-Government Partnerships Innovator at In-Q-Tel. 

Strand initially applied to be an analyst for the CIA but ended up as the Directorate of Science and Technology instead. She mentions that path as being “one of the best unanswered prayers of my career” as she was put into an environment where “we were encouraged to lean in and take risks.” She notes that the environment shaped the rest of her career. 

Tools to succeed

Successful women also need to be at the helm of recruitment efforts, represented in a wide range of occupations to help female candidates see themselves and a career path they want to go down. Strand also recommended starting the mentoring process earlier by pairing new hires with senior female sponsors. She and the other panelists agreed that we all need to lift other women up, as we stand on the shoulders of the women who came before us.

“Wherever you can, lift other women up,” Strand said.

The events of last week have changed the world as we have come to know it. We woke up to a war in Europe, and Russian forces advancing on the city of Kyiv. The world now feels more uncertain, volatile, and complex. The so-called “fog of war” has descended around us and we struggle to make sense of what we are witnessing, with disinformation being weaponized to generate further confusion. But decisions still have to be made, and these decisions will alter the course of history. 

We built Primer for this exact scenario. We started Primer to put the best machine learning tools in the world into the hands of the very people who are required to make decisions to protect communities, civil liberties, critical assets and infrastructure, and our way of life. What we deliver to them makes a difference, and the timeline to deliver has never been more urgent.

Today, we’re making Primer Command generally available to a much wider audience of public and private sector organizations. Command is our AI-driven situational awareness solution that is deployed in technology, financial, security and other critical industry sectors to help users meet an ever-widening array of high-stakes security challenges and opportunities. 

Please read on to learn more about what Command can do for your organization, and contact me directly if you’d like to discuss how Primer can support your mission.

– Sean

 

 

 


 

 

 

Today, Primer announces a new solution for monitoring fast-breaking global events: Primer Command. Powered by Primer’s advanced natural language processing (NLP) technology, Primer Command enables organizations to monitor, analyze, and respond to rapidly unfolding events in real time. Primer Command is the industrial-grade security and crisis management solution that can structure and summarize a high volume of real-time publicly available information, including news and social feeds, at speed, and with human-grade precision. 

Command brings intelligence to all levels of decision making, including strategic, operational, and tactical, and streamlines communications so that teams can operate in real time from a shared understanding of the situation.

Why we built Primer Command

In today’s complex world, real-time situational awareness is critical for the public and private sector to effectively respond to major world events, such as armed conflict, natural disasters, terror and cyber attacks, or geopolitical unrest. Organizations need to make timely decisions regarding the security of their people, assets, and critical infrastructure, as well as ensure operational continuity. Yet when a crisis strikes, threat assessment and operational intelligence teams are overwhelmed by the sheer volume of information coming from news and social media — and that information can evolve from one minute to the next. 

 

 

 

 



“Military commanders and business leaders live in an increasingly complex information environment, which is a morass of equal parts champagne and swill. Rapidly sorting through fact and fiction and pertinent information to enable time-sensitive decisions is both more important and more difficult.” 

 

– Tony Thomas, Former Commander, U.S. Special Operations Command

 

 


We are now extending our technology to help commercial businesses, who face similar challenges, solve their most critical problems. Using Primer’s world leading AI technology, Command provides a 360-degree view of a rapidly evolving event or emerging threat. It leverages AI-powered intelligence tools to quickly separate signal from noise, enabling organizations to focus on strategic priorities and make fast, timely decisions with confidence.

Analysts and decision makers use Primer Command to:

  • Monitor a wide range of global data sources including news, social media, and situation reports (SITREPS) in more than 100 languages.
  • Understand the entire picture of a breaking situation within a rich global and historical context.
  • Gather rich context of an event from extracted and deduplicated images and videos aggregated together into one single view. 
  • Extract insights from various data fused together, such as text, images, and video, using integrated AI technologies like computer vision and NLP.  
  • Geofence a particular area of interest to keep track of relevant changes over time. 
  • Detect narrative patterns and trends by identifying and understanding disputed information.
  • Separate signal from noise by removing duplicate information. Even if authors use different words to describe the same thing.
  • Generate knowledge bases fast to understand the people, places, and organizations that matter most.

With Command, intelligence professionals no longer need to rely on rudimentary tools and manual methods to discover salient information and stitch together insight reports — especially when every minute counts. Primer Command delivers reliable, real-time data at scale, saving analysts countless hours and freeing them to focus on higher-level analysis.

Command in action

Primer Command supports image recognition, deduplication, and clustering to help analysts process enormous volumes of real-time data with human-grade precision.
Primer Command’s real-time AI-generated situation reports enable analysts to track breaking events faster and maintain a comprehensive understanding of the impact, implications, and outcomes of events.

Inside Primer Command 

Primer Command surfaces key insights on a breaking event from a single dashboard that’s updated in real time using situational awareness tools. Data flows in from over 60,000 news and social media sources, providing a robust OSINT data feed to ensure that analysts have a comprehensive view of the event. Analysts can then apply powerful filters to extract the information that they care about most, such as key people, organizations, locations, potential disputed information, engagement, keywords, disputes, sentiment, or languages. They can also run a query on affiliation or relationships between people to gain a deeper understanding of key players. 

Command integrates several key AI technologies into one seamless user experience that allows analysts to extract insights from different data formats without ever leaving the platform:

  • Named Entity Linking maps text into geographic locations, creating a visual of where unfolding events are happening. 
  • Advanced computer vision technology groups similar images together, allowing analysts to conduct visual searches, finding different perspectives of the same event, and identifying a wide angle version when the initial image was tightly cropped.
  • Optical character recognition (OCR) technology identifies and translates text embedded in images. 
  • ML algorithms are optimized for speed to support low latency workflows in rapidly evolving situations.

When analysts need to share information with others, Primer Command generates a real-time situation report within seconds, enabling collaboration across users, teams, and organizations, helping leaders to quickly gain a comprehensive understanding of the event and make data-driven decisions.

Primer Command at work

A wide variety of organizations are monitoring and responding to fast-breaking global events, from corporate global security to teams in charge of humanitarian response and tracking global cyber attacks. With Primer Command’s real-time event monitoring and alerting capabilities intelligence teams gain a comprehensive understanding of the impact and implications of an event. 

Primer Command’s real-time event and risk detection capabilities empower global enterprises to monitor threats and build effective communications strategies.

  • Corporate Global Security Operation Centers (GSOC): Security analysts use Command to monitor and respond to crises that affect company assets, personnel, and critical infrastructure across multiple geographies.
On 23 February 2022, Primer Command captured the first news and social media reports that Russia had declared war on Ukraine and the country was under attack.

 

  • Humanitarian Response: Humanitarian organizations and first responders use Command’s custom-trained humanitarian filters to identify crucial information from disaster zones and quickly respond to rapidly evolving emergencies. With Command, first responders can pinpoint exact locations, both mention and geo-tag, for accurate situational awareness, helping to ensure staff safety, while delivering well coordinated and targeted assistance.
Primer’s advanced humanitarian AI filters enable analysts to quickly identify breaking reports of infrastructure damage and evacuations as Russian forces move in on 23 February, 2022.
  • Cybersecurity: Security teams use Command for threat detection and early indications and warnings of potential cyber security attacks. Teams can track malicious actors and cyber security news, in addition to geopolitical events. In some cases, this can provide hours of advanced notice versus traditional reporting. 
Amid warnings of an imminent Russian invasion on 23 February, 2022, a massive cyber attack hits Ukraine as analysts monitor the situation through the lens of Primer Command.
  • Financial risk: Fortune 1000 companies use Command today to track rapidly evolving geopolitical situations like in Ukraine and the potential risk that such events pose to a company’s business, supply chain, and its customers.
On 23 February 2022, Primer’s AI-generated situation reports captured and summarized the impact that the imminent conflict in Ukraine was having on global energy markets.
  • Brand and reputation management: Corporate communications, brand, and product teams use Command to listen and respond to public discourse around their company’s activities and products, as well as to counteract misinformation and disinformation.
Primer’s disputed information detection AI enables analysts to quickly identify information that contains potential fake news, propaganda, disputes, or contradictory narratives.

At Primer, we believe that Primer Command can not only help organizations to effectively respond to specific events, but also help them to protect our economy, our democracy, and our way of life by ensuring the safety and security of individuals, organizations, and societies. We’re excited to make this solution available to Primer customers and we look forward to seeing its positive impact on the world.


For a free Primer Command trial, click here.

With LightTag’s innovative team-based label management software, Primer helps customers accelerate delivery of mission-critical NLP applications 

One big obstacle our customers face in operationalizing NLP occurs at the starting line – with data labeling. Better labeling means better data on which to train NLP models, and therefore higher model performance and faster deployment of models into production. One hundred great labels can mean the difference between a model changing the business outcome, or it just being another underperforming prototype.

However, customers consistently tell us that labeling documents is a time-consuming, repetitive, and error-prone task and they need better solutions. Building good NLP models that you can trust, especially for the mission-critical customer applications Primer supports, is vitally important. 

A key ingredient in our secret sauce has always been meticulous control over the quality of our data and its cost of production. In response to customer demand, I’m thrilled to announce today that Primer has addressed this pain point by acquiring LightTag and that Tal Perry, the founder and CEO of LightTag, is joining Primer as Principal Product Manager.

LightTag’s team-based labeling solution offers innovative features for project management, annotation scoring and conflict management, and quality control. For the last two years, we’ve relied on LightTag’s quality assurance mechanisms to review our labeled data, and our models.

LightTag makes it easy to set up labeling tasks for teams of labelers, accelerating the time it takes to start training NLP models. This can be done online, with labeling commencing in minutes. If you need to train a model on sensitive data, you can deploy LightTag on premises in a secure environment. 

LightTag’s UI is geared toward project management, saving tremendous time and effort for those who currently use Excel or Google sheets to manage hundreds of cross-team projects. Instead of cleaning and labeling data sets, data scientists are free to focus on the key problems they need to solve.  

Primer gives anyone the power to build and train models more efficiently with autoML strengthened by Data Map and active learning, no coding or technical skills required. Data Map automatically surfaces potential mislabeled data so customers know what to re-label to improve their model’s performance dramatically faster, and active learning identifies the labels that will best improve model performance, reducing the amount of labeled data needed to train a NLP model by up to 30x. 

By combining Primer’s machine-driven programmatic labeling capabilities with LightTag’s innovative human-driven labeling, and a full-service white glove labeling solution, we provide an end-to-end labeling solution that serves the full range of customer needs—whether you build your own models using LightTag, customize our models for your use case, upload your pre-labeled data, or let Primer give you a hand with labeling to save you even more time. This combination of human and machine intelligence is essential to achieving the accuracy, trust, and explainability required to rapidly label, build, and deploy NLP models at enterprise scale. With this acquisition, our customers can accelerate knowledge transfer across domain experts by making it easy for them to teach machines what they know thereby decreasing time to value of a wide variety of NLP applications.

“What will it take for you to trust a model with your next mission-critical decision ? 

“The world is awash with pretrained models that achieve state-of-the-art results on commercial datasets. What matters to our customers is accuracy on their own data. The way to trust and train an NLP model is by exposing it to hand-labeled data, from the data sources that matter to you, labeled with the concepts that you care about.” 

Tal Perry
Founder, LightTag
Primer’s Engines provide predictions and pre-annotations that accelerate your labeling.

What’s next

Primer is an enthusiastic user of LightTag, having used the product to label hundreds of thousands of documents to train Primer ML models. As part of our acquisition, we’re offering the LightTag capabilities that made us successful to our customers free for anyone up to 5,000 labels per month. Nonprofits and academic users can continue to apply for a free team license on the LightTag website. We will maintain LightTag as a standalone product, so for all the current LightTag customers that know and love the product, you’ll continue to get the same great experience. LightTag by Primer will continue to operate and be available as both a SaaS offering, and an on-premise and air-gapped deployment for customers with high-security requirements. 

As of today, users of LightTag can try out Primer Engines to provide predictions and pre-annotations to accelerate their labeling. As a next step, key features from LightTag will be integrated into the Primer NLP Platform, enhancing Primer’s ability to build and scale world-class NLP models that automate and accelerate the analysis of massive unstructured, text-based datasets. With Primer, organizations of all sizes can harness their data to deliver timely and reliable insights to achieve mission objectives and competitive advantage.

I want to take the opportunity to welcome all of LightTag’s customers to Primer. We are excited to continue to support all your NLP use cases and will be doubling down on our investment into LightTag to make it the best NLP labeling software in the world. LightTag bolsters Primer’s strategy to be a world leader in industrial-grade NLP solutions that transform “information overload” into “mission-critical intelligence” for faster human decision making and better outcomes. 

As the world continues to change rapidly, Primer stands side by side with our customers to help them address new and emerging challenges and opportunities, from disinformation and cyber/physical security to brand reputation, global customer insights, supply chain impact, and more.

At Primer, we are building the infrastructure to support the creation and deployment of the mission-critical NLP models. LightTag builds out a core part of the Primer infrastructure and complements our Automate and Engines products.

You can access LightTag here — login and get those labels together to train all your NLP models for free!

For more information about Primer and to get a demo, contact Primer here.

Teaching machines what we know, so that they can do things we can’t.

“2021 was a banner year for Primer with tremendous business growth, product innovations, and new NLP solutions for a growing set of government and commercial customers worldwide. 2022 promises even more as we continue to commercialize industrial-grade NLP for mission-critical applications.”
– Sean Gourley, Primer CEO

2021 was an exceptional year for Primer. We raised the bar in significant ways across all aspects of the company, honing our ability to deliver industrial-grade NLP solutions that are trusted by public sector and business institutions to transform “information overload” into “mission-critical intelligence” for faster human decision making and better outcomes. 

Primer’s customers are people on the front lines in high-stakes, fast-changing environments – spanning finance, retail, national security, and more – who rely on timely and reliable insights to make more informed decisions, faster. They’re analysts, operators, and business leaders – and the developers and data scientists who support them – responsible for tasks such as monitoring global events in real time, anticipating supply chain disruption, mitigating risk to brand reputation, improving stakeholder experiences, identifying disinformation campaigns, and other hard problems where a 360-degree understanding of all available data is vital. 

Primer stands side by side with our customers to help them apply advanced NLP technology to massive amounts of unstructured data – with greater speed, ease, accuracy, and agility – to achieve competitive advantage and mission-critical objectives. Today, up to 80% of an organization’s data is unstructured, according to Deloitte. It remains vastly untapped despite being critically important to providing a holistic view of an organization’s operating portfolio. That’s a gap Primer is uniquely qualified to address. 


While it’s still early days for NLP, the market is growing fast – estimated to reach $127 billion USD globally in 2028 – and we’re committed to building Primer for the long term. I’m pleased to share several examples of the investments we made last year in the people, products, and partnerships that enable Primer to achieve our mission to always better serve our customers.

2021 Primer Highlights

  • Secured $110M in Series C financing to support our strategy of becoming a world leader in NLP solutions for mission-critical applications. Funds are being used to accelerate our product development and delivery capabilities to meet a growing set of customer use cases worldwide. 
  • Expanded and deepened our customer base worldwide in both public sector and commercial markets, with major contributions from our new London and Singapore offices. With customers such as USSOCOM, Walmart, Microsoft, and others I hope we’ll be able to name publicly soon, Primer is trusted by a growing number of world-leading organizations to support their language-driven intelligence initiatives.
  • Launched significant new product capabilities – including Automate, Engines, and Command – part of Primer’s end-to-end NLP platform that makes it easy for data scientists and developers to rapidly label, build, and deploy intelligent applications on their data at scale. Primer’s pre-trained NLP models are available “out of the box” and customizable, bringing additional velocity to builders who want to deploy NLP models into production for their unique use cases. 
  • Hired top-notch talent at all levels of the company, around the world – an impressive feat in such a fiercely competitive hiring market. We were recognized as a 2022 Best Place to Work for the second year in a row, which is a testament to our commitment to building a company culture that empowers people to thrive. Importantly, our bench of investors and advisors is world class.
  • Formed strategic alliances with Microsoft and Palantir to serve mission-critical U.S. government needs.
  • Selected as a top 10 mid-stage growth company on the inaugural Intelligent Applications Top 40 (IA40) list, recognizing the top private companies Madrona Venture Group, Goldman Sachs, and top-tier venture capital firms believe will define the future of software.

That’s just a sampling of key milestones Primer achieved this past year. We’ve established a strong foundation to build on, and there’s so much more to come in 2022 and beyond. As the world continues to change rapidly, we’ll continue to step up to help our customers meet an ever-widening array of challenges and opportunities. 

My personal thanks and appreciation to all of our employees, customers, and partners – you all contribute to our mutual success. I feel immensely positive, grateful, and proud of what we are building together. It’s my honor to work with and lead such a talented, dedicated team.

I know I speak for all of us when I say we are prepared and energized to meet this year.

– Sean

For more information about Primer and to access product demos, contact Primer here.

IA40 names Primer a top 10 mid-stage growth company “building the next generation of software that will change our lives” 

Primer is recognized in the inaugural Intelligent Applications Top 40 (#IA40), sponsored by Madrona Venture Group and Goldman Sachs, which honors their list of top private companies building software applications with AI and ML that “truly incorporate intelligence into how they process data and predict outcomes.” (press release)

Top-tier venture capital firms investing in this industry, including Addition, Amplify Partners, Lux Capital, and Steadfast Capital Ventures, as well as Amazon and Microsoft, nominated and voted on the companies they believe will define the future of software. 

Why Intelligent Applications?
Intelligent applications leverage machine learning models embedded in applications that use both historical and real-time data to build a continuous learning system,” according to the IA40 initiative. “These learning systems solve a business problem in a contextually relevant way — better than before — and typically deliver rich information and insights that are either applied automatically or leveraged by end users to make superior decisions.”

Flywheels will start to emerge in various sub-sectors of intelligent applications,” IA40 sponsors added. “The flywheel of leveraging diverse and robust data to create contextually relevant machine/deep learning models that are then deployed to help solve real-world problems and then the learnings from those inferences (which is more data) are incorporated to further improve the intelligent applications.”

This flywheel effect may drive sustainable competitive advantage and encourage new go-to-market and pricing models that facilitate customer adoption.

Primer CEO Sean Gourley stated, “The promise and potential of industrial-scale AI/ML is materializing every day, in every industry, as machines achieve new levels of trusted, scalable, human-level performance for mission-critical tasks. Primer continues to help meet this need for a growing number of national security and commercial enterprises worldwide.”

“We’re proud to be included in this impressive set of IA40 winners,” Gourley added. “This award reinforces our mission to deliver advanced machine learning and NLP technology to help our customers address the world’s most pressing problems.”

Learn more about the Intelligent Applications Top 40 at www.ia40.com.


For more information about Primer and to access product demos, contact Primer here.

At the recent Cipher Brief Threat Conference, NPR’s national security correspondent Greg Myre interviewed several U.S. intelligence experts to understand the most pressing threats to U.S. national security.

What rose to the surface? China and AI.

Myre describes the intelligence community’s current priorities in a story called “As U.S. Spies Look to the Future, One Target Stands Out: China.

“I call this entering the third epoch of intelligence,” said Sue Gordon, former advisor to five of the last six U.S. Presidents and the National Security Council, and current advisor to Primer.ai.

Regarding prior counterterrorism efforts, Gordon added, we “realized that the world had become digital, and that we hadn’t been focusing on all the things we needed to. The rise of China happened during those years, and now you see us talking about Great Power competition.”

Clearly, the U.S. intelligence community is making a pivot to China. But how do they recruit the next generation of officers with the right talents and skills?

“The ideal candidate would be a fluent Mandarin speaker, with an advanced degree in artificial intelligence — and a willingness to work for a government salary,” wrote Myre.

That is “quite a unicorn…but they’re out there,” said Cynthia Strand, a 35-year CIA veteran who now leads global intelligence strategy for Primer.

“Imagine if you had a large cadre of good interns,” Strand said. “You want to put them on the tasks where they can cut their teeth and learn, and leave the higher thought work to people who have been trained and practicing for a long time.”

“Human intelligence remains critical, but technology keeps leaping forward,” Strand said.

“No one human being, no matter how exceptional they are, can consume and make sense of the volumes of data that are available. Machines can do that beautifully,” Strand added.

The story concludes citing Strand: “It’s just one example of how technology is redefining spycraft for a new era – an era that’s here to stay.”

Read and listen to the full NPR story here: https://www.npr.org/2021/11/16/1051170999/as-u-s-spies-look-to-the-future-one-target-stands-out-china