Cyber threat hunters proactively cull through endless amounts of data to determine if there has been an intrusion on a network and create hypotheses about potential threats. They have their work cut out for them with research finding that in 2021, the average number of cyberattacks and data breaches increased by 15.1% from the previous year. Unfortunately, the problem is only expected to get worse with estimates predicting attack rates to double by 2025.  

Download Primer’s Guide to NLP Use Cases

But computer networks aren’t the only targets. Operational technology like systems that control utilities, railroads, aviation, hospitals, and other critical sectors are being targeted. Several high profile attacks showcase these ominous statistics and attack vectors. The Russian hacker group Killnet recently claimed responsibility for a DDoS attack on Lithuania, stating they’ve “demolished 1652 web resources.” And North Korea state-sponsored cyber actors have targeted the healthcare sector using ransomware to hold health records hostage until they get paid. 

Security analysts tasked with tracking cyber actors’ tactics, techniques, and procedures (i.e. TTPs) and attack trends need to have their finger on the pulse of cyber attack data. They need to monitor incoming information around the clock, anticipating any threats, and providing insights to their information security and information technology teams to help guard company assets and infrastructure. 

Primer Command automates threat intelligence extraction

Primer Command’s natural language processing (NLP) algorithms automatically structure and summarize millions of news and social media posts across 100 languages to surface key people, locations, organizations, topics, suspected disinformation, sentiment, and media to better understand how a situation is evolving. 

Command can take unstructured information, including threat intelligence posted online from advisories, and tag key entities that are mentioned. Often multiple sources detail the same threat intelligence, leading to wasted analyst time. By having this information organized, deduplicated, and summarized, cyber analysts will have more time to spot connections across threat vectors and actors. 

The following puts a spotlight on the power of Command’s ability to zero in on the information that matters most to cyber threat analysts through advanced filters and AI-enabled classifiers. 

Cyber attack filters

Command can filter on six types of cyber attacks to drill down into the information cyber threat analysts care about most.  So instead of conducting keyword searches on multiple social media platforms and search engines to find the latest information on cyber threat attacks, analysts can strip out the noise and save time by getting the insights they need in one place.

Taking a look at the Distributed Denial of Service (DDoS) attack on Lithuanian energy supplier Ignitis Group, security teams can learn more about the attack by filtering social media and news posts in Command by DDoS attack type. This instantly delivers reports about the Ignitis attack as well as other DDoS attacks that could be of interest to security researchers. 

They can also learn what major cyber security firms report with organization filtering tags. They can isolate reporting from industry experts, like Proofpoint or Bleeping Computer, to learn their take on the DDoS attack or other attack vectors. In the same way, users can isolate reporting related cyber threat actors like Killnet or the Lazarus Group.

Infrastructure filters

Command’s advanced filtering capabilities allow cyber threat analysts to unlock critical information about ongoing or recent attacks on infrastructure. Command’s industry-leading NLP models accurately identify social media posts related to infrastructure, including companies and government organizations charged with managing and protecting these critical public goods. 

For cyber threat hunters, this filter becomes more interesting when combined with cyber threat filters, like ransomware. With these combined filters, security teams can immediately see several social media posts about ransomware attacks on operating networks for companies or governments managing important public dependencies. 

Organization extraction

By just looking at organization extractions, cyber threat analysts can quickly identify what the trending cyber attack victims and culprits are across multiple social and news media without having to read through all of them. Command’s extractions surface who the attacker or victim is, enabling cyber threat analysts to quickly see all the reporting related to these entities. In the screenshot above, “Omnicell” would bubble to the top as a victim or “Mantis Botnet” as an organization. Similar to above, this can be layered with other filters like type of attack.

Learn more

To learn more about Command’s other capabilities, click here for a free trial of Command for Tactical Insights. Contact sales to discuss your specific needs. You can also stay connected on Linkedin and Twitter.

We create the tools behind the decisions that change the world. ©2022 Primer

• An emerging narrative focusing on harmful chemicals found in food packaging was threatening the brand reputation of one of the world’s largest fast food operators.

• The Yonder by Primer project revealed the influential factions pushing the narrative, and potential impact on the brand.

• The brand responded by announcing a multi-million dollar policy change, avoiding reputation damage.

Primer technology empowers smart decisions for crisis communications

One of the world’s largest fast food chains faced a serious challenge in 2020. A brand-damaging media narrative around the use of PFAS— a harmful “forever chemical” fluorinated compound often found in food packaging, nonstick cookware, and bottled water – was emerging on social media. The company’s internal communications team determined they needed media intelligence to understand the broader narrative and the influential online groups who were driving it in order to make a decision. 

​​The Yonder by Primer solution

The Yonder product is able to look at what groups are engaging with emerging narratives, and then introduce historical links between groups – the key differentiator often needed. In the PFAS case, the fast food chain needed to know if the troublesome narrative was simply a random group of people online, or a group that had historically been trying to push a narrative that could damage the brand. Before the narrative about PFAS ever went viral, Yonder product predicted the brand would become the main protagonist that online groups would organize against, unless action was taken to address the issue. 

The Yonder product was able to quickly answer both real-time and historical questions:

  • Who’s engaging with the narrative?
  • What are the links between the peripheral group and the access they have to other networks? 
  • Do those other networks have factions that are historically successful in introducing new and potentially damaging narratives to the mainstream? 

Yonder is then able to analyze the historical behavior of how the group distributes content. In the PFAS narrative, the app then delivered insights around: 

  • The origins of the narrative
  • The trajectory of post volume
  • Involvement  by influential groups online. 

 Results

The analysis resulted in a recommendation that potentially saved millions in reputation damage. The fast food giant announced a U.S. $6.4M policy change, pledging to stop the use of PFAS in its food packaging globally. The brand quickly aligned its teams around a proactive policy change decision, saving months of time-consuming, back-and-forth debate on the potential impact of the narrative on the brand and how to respond, if at all. This decision thwarted a growing reputation crisis, built trust among consumers, and was recognized by activist groups. 

Brand monitoring 

Primer’s brand and reputation management solution gives marketing and PR teams a real-time view of their brand in the global marketplace. With AI-powered analytics capabilities, Primer separates signal from noise to surface actionable insights that help teams understand their market, identify risks, plan marketing activities, and protect their brand’s reputation. Yonder discovers the hidden groups who control and amplify online narratives. The product analyzes the historical influence of these groups to predict how they will impact narratives in the future. 

Learn more about this Primer solution. Better yet, contact Primer and let’s discuss how our NLP technology can help protect your brand and keep your organization ahead of threats.

Alliance Trucking is a regional trucking company that’s been serving the Dallas-Fort Worth area for over 30 years. Alliance acts as a broker between truck owner-operators and construction companies who need loads of construction material delivered. 

To request pricing for materials, Alliance’s customers email requests for quotes (RFQs) describing the location, the materials that need to be transported, and other details pertinent to the job. Alliance’s team then analyzes the emails to estimate trucking routes, availability, and costs and return a quote to the customer. 

Running into friction scaling their core operations 

Producing accurate quotes keeps Alliance competitive while ensuring a sound operating margin. But estimating the cost of a delivery is a complex human process that represents a lot of overhead. An estimator receives a bid in their help desk software, HelpScout. The bids come in as unstructured text with the details, including a schema of the location, material, and time frame. Estimators take that information, do the route research, and then manually input the job into Alliance’s purpose-built software. 

This manual process created variation in their pricing, inefficient routes, and constrained the number of RFQs Alliance could respond to. Alliance wanted to scale their operations by offloading the route estimation to machine learning. 

Using LightTag to build a dataset

Alliance needed to be able to feed emails directly into an NLP model that could understand RFQs and produce precise estimation criteria. The estimates would then be loaded into their software to be approved and executed. To build an NLP model for the trucking industry, they first needed to build a high quality dataset of job estimates from which their custom model could learn. 

To ensure data quality, the labeling job fell to their CFO, Eric Dance. To maximize the return on investment he invested in labeling data, Alliance set out to find a solution that would make labeling simple and efficient and selected LightTag.

Choosing the right data labeling solution

LightTag learned from Eric as he annotated and then provided pre-annotations that automated a large fraction of Eric’s work. An intuitive user interface made the software easy for Eric to use and build a dataset he was happy with, despite the fact that he is not a technical user.

“I’m not a developer and I don’t have those skills, but I understand my business and wanted to digitize it. I found LightTag easy to use to build a dataset of thousands of emails so we could get a quality dataset.” 

-Eric Dance, CFO, Alliance Trucking

Results

Eric was able to build a dataset with tens of thousands of annotations and train a precise RFQ estimation model. From there, Alliance automated the ingestion of bid emails and responses to RFQs in a faster, more consistent manner.  Today, RFQs can be created and approved by a truck dispatcher in seconds, rather than minutes. 

Alliance is a great example of how NLP can be applied not just to cutting-edge use cases, but to any business process. And it’s a shining example of how LightTag can dramatically improve the labeling process for subject matter experts to spend as little time labeling as possible, while still building a quality dataset to train a machine for human-quality results. 

To learn more about LightTag, Primer or request a demo, contact us here.

Technological advances in deep learning models have made it easier to use unstructured data for a whole new class of ML tasks.

Explaining natural language processing (NLP) and what it can do for an enterprise can be a daunting task. It’s a complex technology. Simply put, NLP can instantly “read” and process massive volumes of text data and find insights that would be almost impossible for humans to surface at scale. Thanks to advances in deep learning models, the ability of NLP to scan, process, and analyze unstructured data is increasing exponentially.

What is unstructured data? 

Knowing the difference between structured and unstructured data helps to gain an understanding of the power of NLP. Fortunately, it isn’t difficult. Structured data is likely what you’re already thinking of when you consider what NLP does. For instance, if you wanted to get answers about a local housing market, much of the data you would want analyzed can be easily categorized and placed in a spreadsheet. Numbers relating to cost, size, days on market, number of bedrooms and bathrooms, parking spaces, and so forth would help someone find insights into this topic.

The unstructured data that is available isn’t nearly as neat and orderly. Using the same example of a local housing market, if you also wanted to add data regarding public perception of neighborhoods, municipal services, and quality of local schools and businesses, the data points aren’t exactly spreadsheet-ready. 

What’s exciting in NLP is that it is gaining the ability to accurately analyze more abstract data, like comments posted on internet review sites, tweets, or relevant news and video about a location. These aren’t numbers you can add to a table like square footage or property taxes. Rather, machine learning is developing the ability to parse human sentiment, positive and negative attitudes, sarcasm, hashtag meaning, and other intangible perceptions and add the results to the analysis of simpler, structured data.

For a more detailed look at the topic, download the white paper “The Path from Structured to Unstructured Data.” In it, you’ll get information on

  • Unstructured data use in the enterprise
  • The effect of advanced deep learning models
  • Data infra requirements for unstructured data
  • How distributive and generative tasks make sense of both kinds of data

DOWNLOAD NOW

For more information about Primer and to access product demos, contact Primer here.

Command’s powerful capabilities can be seen through the lens of Global Security Operation Centers (GSOCs) seeking to provide support and information to employees in Ukraine.

Provide duty of care to employees in Ukraine

For global companies, monitoring for threats to employees or assets requires constant attention. Whether it’s a natural disaster or an act of war, GSOCs are charged with alerting and providing guidance to employees in harm’s way. With Russia’s recent invasion in Ukraine many companies are quickly pivoting to provide support to their employees in the country. Looking back to earlier this year, many people in Ukraine did not think an invasion would happen and are now regretting not evacuating sooner. This is especially true for military aged males who are now being conscripted by the military and unable to evacuate legally. Hearing their employees being separated from their families and in some cases forced to enter military service, employers are expanding their boundaries of care to look after the well-being of their employees and their families in Ukraine as well as Belarus, Russia and the surrounding regions. GSOC’s are being charged with monitoring threats and providing periodic updates to their employees. 

The following showcases how Command helps security teams provide duty of care to their employees with a focus on Kyiv, including:

  • Inform employees about local threats 
  • Update employees on local authority guidance, such as curfew orders
  • Highlight damage to utilities and infrastructure
  • Notify employees about supply shortages

Warn employees about proximity to threats

On February 24th after Russian President Vladimir Putin announced that he had ordered a “special military operation” in eastern Ukraine, missiles began to strike dozens of cities across the country including the capital, Kyiv. By February 27th reports started flooding social media stating the capital was surrounded by Russian forces making evacuations impossible. 

  • Filter by location: The GIF below shows how security teams can quickly filter reporting by their employees location. 
  • Deduplicate: Command automatically collapses all similar reporting so the security analysts do not waste time reading 17 posts about the same Russian convoy marching into Kyiv.
GIF showcasing Command’s filtering and deduplication capabilities.

Update employees on local authority guidance 

As Russian troops continue to advance on Ukraine’s capital, employees in Kyiv need to know the latest guidance from local authorities. 

  • Filter by local authorities: Using the people filter for Kyiv Mayor Vatali Klitschko, social media and news reports related to him and his statements appear. 
  • Key word search: Security teams can further filter by searching for “curfew” to see the latest guidance.
Partial list of people automatically categorized by Command 
Social media post Command tagged as associated with Kyiv Mayor Vatali Klitschko.

Highlight damage to utilities and infrastructure

Employees need to know if there are threats to their power and internet connectivity.

  • Filter by infrastructure: Command is powered by industry leading Natural Language Processing models that accurately identify social posts that fall into this filter. This capability allows security teams to isolate social media reporting related infrastructure and utility damage. 
  • Flag disputed information: While there is a continuous stream of mixed reports about power outages in Kyiv, Command automatically flags disputed information so security teams can accurately track and inform their employees about outages.
Partial list of Command’s filtering options
Examples of Command’s social media results related to internet connectivity. 

Track evacuation options

Some employees in Ukraine are probably seeking support from their employers’ security team on how to evacuate the country safely. 

  • Filter by evacuations: With Command, GSOCs can quickly respond to these requests after narrowing the enormous amount of social media posts related to Ukraine down to just information related to ‘displaced people and evacuations.’ With this filter security teams can summarize social media posts related to routes and border crossing locations. 
  • Map layer: For employees looking to move out of Kyiv to another part of the country, security teams could use Command to filter by ‘Infrastructure Damage’ then select the social mentions map layer to quickly identify areas employees should avoid. 
Partial list of Command’s filtering options and example of Command’s social media results related to evacuations. 

Stay organized and collaborate with team members

Security teams can bookmark and label important reports in their feed. This allows users to stay organized and formulate it into a situation report at a later time. Command’s engineers are also working on the ability to share queries with teammates to form a common operating picture.

Learn more

To watch how Primer Command could help you analyze fast-breaking events, checkout this article posted in AI Business or sign up for a free trial of Command by clicking here. Or contact Primer to discuss your specific needs. You can also stay connected on LinkedIn and Twitter.

A case study of providing early warning and responses to potential risks

The challenge

Agenda, LLC a D.C.-based international boutique public affairs and strategic communications agency, regularly monitors news and social media to stay vigilant around several key areas of concern for their mission-driven clients which include NATO, USAID, UN Women, and private sector clients. Because the information environment has become so complex and challenging to monitor, having the right information and understanding the trajectory of narratives is critical in supporting communications strategies and decision making.

Part of Agenda’s advisory services requires monitoring open-source data feeds for potential risks to public perception and operational security issues. This is where Primer Command and Primer Analyze come in. The AI-powered solution enables Agenda to closely monitor news and social media and track narratives, allowing for early warning and providing time for Agenda and its clients to respond.

Before working with Primer, the Agenda team had been using several data mining tools for alerts and analytics, along with manual tasks in a workflow that was cumbersome and too time consuming to support effective issue and crisis management.


The solution

With Primer, Agenda has a holistic tool that provides a rich snapshot of unfolding events in real time. Primer helps separate signal from noise by surfacing key insights and following the trajectory of narratives in the information environment.

From one unified dashboard, Agenda’s research and client-facing teams can access a range of powerful capabilities to understand their clients’ operating environment, identify risks, plan remediating activities, and help protect their clients operational security. 

Primer Command is a versatile, real-time security and crisis management intelligence solution that enables us to monitor open-source data feeds for potential threats from adversaries and malign interests. We depend on Primer over all other platforms for early warning and to respond to potential risks to our clients’ operational security, business continuity, and brand reputations.

Doug Turner, CEO, Agenda, LLC

Select features

  • Custom queries monitor important topics in news and social media across 100 languages, including Chinese and Russian, translating content into English.
  • Custom alerts flag when key terms are mentioned.
  • Real-time feeds surface articles about executives, competitors, or other entities relevant to the query topic.
  • An interactive map plots key locations that are central to the news story.
  • Data volume trends indicate whether news coverage is heating up or cooling off.
  • Topic analysis surfaces the most popular themes, hashtags, and links being shared about the company, as well as disputed information.
  • Sentiment analysis shows trends in public attitudes.
  • Custom, human-quality situation reports automatically generate summaries of key news events within seconds.

 Results

 Agenda measures performance based on key data accuracy and time to generate reporting. Using Primer, the accuracy of monitoring and early warning has improved, and the time to present and act upon findings is reduced. Rather than spending hours monitoring news alerts, social media, and using multiple data mining tools, the Agenda team has more time to better serve their clients.
 
For more information about Primer Command, click here. And to learn more about Primer’s end-to-end NLP solution, click here. To access product demos, contact Primer here.

Primer’s Natural Language Processing (NLP) solutions can help asset managers uncover potential investment opportunities, manage downside risks, and monitor the impact of rapidly evolving events on financial markets. 

Asset managers have to identify investment opportunities, manage risk, and stay on top of market developments all at the same time. Their success depends on the ability to find and synthesize the information to draw out unique proprietary insights in a timely manner. That said, processing, analyzing, and drawing actionable insights in a rapidly-changing market is extremely hard. The value of information has a short “half-life” as asset managers are in fierce competition to exploit any information before others can. 

While improvements in financial analyses in the past were based on finding faster and better ways of processing and analyzing numerical data, this has slowed in recent years. Instead, asset managers have realized that the frontier has moved to alternative and unstructured data analysis. Text data is a form of unstructured data that is hard to process and therefore has a higher “concentration” of unexploited information.


This has led asset managers to use Natural Language Processing (NLP) to process and make sense of textual data. Asset managers who don’t leverage NLP risk being left behind and losing their competitive advantage over time.

Emerging challenges in asset management

Asset managers now find it harder to uncover new investment opportunities amidst a challenging landscape. More private companies are delaying or not taking the company public, as capital is more readily available compared to the past for private companies and there are fewer regulatory reporting pressures in remaining private. This trend has led to fewer investment opportunities in the public markets. Retail investors are also becoming more sophisticated in using data and sharing information on forums such as Reddit. In aggregate, they amass collective influence which competes with institutional managers. Such forums are an information treasure trove, but also one which is not easily mined.  

Asset managers require deeper analysis to form proprietary insights with higher information value. With algorithms and machines rapidly arbitraging known opportunities, investors need to conduct deeper analysis; determining second-order relationships and connecting the dots between seemingly unrelated concepts to create proprietary insights. This, though, is time-consuming and at times requires a lot of reading and extracting key information from swaths of text from various sources. 

Asset managers have insufficient attention bandwidth to fully monitor events impacting their portfolio. Between ideating, researching, and executing their next investment opportunity, many asset managers have limited time and attention to monitor their existing portfolios. This is especially true in situations of abnormal market conditions, such as an unanticipated military conflict or natural disaster. During these events, investors can usually only adequately monitor their top positions, leaving a long tail of smaller positions with only limited or little monitoring.  

To address these challenges, asset management firms are turning to NLP for help and to gain an advantage.

How NLP changes asset management 

Ideate new investment opportunities by keeping on top of emerging trends. Asset managers can use an NLP technique known as topic modeling to uncover emerging trends in a three step process: 

  1. Key topics are firstly surfaced from data sources such as news, research reports, and transcripts;
  2. Topics are ranked to determine the rapid increase in interest, and 
  3. Related companies with these emerging topics are identified as potential investable opportunities.  

Analyze unstructured data at scale to reveal the patterns that matter. NLP can also help asset managers slice and dice large amounts of text by automatically identifying company names, individuals, locations, time, monetary values, and more using both out-of-the-box pre-trained as well as custom-trained models. This allows for a more complex analysis of the documents beyond simple keyword searches. For example, “interrogating” the data on all quotes related to dogecoin by Elon Musk over a certain time period.  

Expand monitoring coverage of risk events that impact a portfolio. Via an NLP technique known as classifiers, asset managers can expand their monitoring coverage and better allocate their attention across their entire portfolio. For example, in the event of a country- or region-wide risk event, asset managers would need to know the impact on all companies in their portfolios. With NLP, they can quickly identify the list of companies, including their smaller positions. 

NLP is emerging as a key tool in asset managers’ toolkit to process unstructured data, gain proprietary insights and increase their risk management coverage. 


For more information about Primer’s NLP solutions and to access product demos, contact Primer here.

The companies combine for a ground-breaking application of NLP technology.

3/24/22 Update: Today, Summari earned the top spot on Product Hunt, demonstrating the power of the Primer Platform.

Summari is an AI-powered assistant that instantly summarizes things you read, simplifying the consumption of enormous amounts of publicly available information. The service is aimed at content teams, analysts, investors, and students who need to read and comprehend critical information in order to make recommendations and decisions that shape businesses and public institutions. Summari’s goal is to make this service available to anyone who needs a rich and relevant summary, instantly.

Reaching human limits and the search for an NLP partner

Before working with Primer, Summari hired college students from top universities to summarize articles manually. The summaries were high quality, but humans can only consume a finite amount of information at a time and hand writing summaries is time intensive. The latency diminished user experience. The upside is that this method resulted in a human-made dataset of the highest quality summaries with which to train computers to replicate.


“Speed of summarization is paramount. Delivering a summary when and where a reader requires it creates a magical experience. The summary gives the user enough information to determine whether to investigate further.”

Ed Shrager, CEO and Founder, Summari

Summari began investigating NLP as a solution to speed the production of human-quality summaries. CEO Ed Shrager attempted to optimize a large open source NLP library, but wasn’t able to fine tune the model to achieve results that Summari wanted. Summari was aiming for a precise format with a high-level abstract and several context-setting introductions for each section of an article, as well as bullet points explaining each section.

Primer enters the picture to build a summarization model

In November 2021, Summari selected Primer as its NLP partner of choice. Primer’s forward-deployed engineering team used Primer’s Platform to train and deploy a customized text-to-text summarization model, coupled with Primer’s data ingestion pipeline, that delivers human-level summarization capabilities for any long-form article on the internet. The Primer Platform makes it simple to ingest and normalize data from almost any data source – web data from any url, applications like Salesforce, databases and data stores, PDFs, and others – connect the data to a customized NLP model, and serve it at scale.

Primer made use use of the existing human written summaries that Summari had collected as training data for the custom summarization model. The Primer Platform was able to unify the human written summaries even though they were written by different Summari annotator.

The resulting custom summarization engine was configured to allow Summari to change key parameters, such as length, style and content. This essentially gave them control to “drive” the summarization engine for their own needs – resulting in better results for their users.

“The high quality of Primer’s results is critical for Summari’s success. Our users must be able to trust that we are providing the right words, in the right context. There’s no room for error.”

Ed Shrager, CEO and Founder, Summari

The results

Primer delivered a summarization model for Summari that achieves comprehensive, human-quality summaries of any long-form article instantly. Faster delivery times allow Summari to expand their offerings from a few dozen publications to the entire Internet. Customer experience has been dramatically improved. Their customers can take any article on the Internet, drop it into the Summari application — which is running a custom Primer text-to-text NLP model under the hood — and generate a quality summarization in moments.

The market’s reception has been strong, earning Summari the top spot on Product Hunt.

It’s not just a breakthrough for Summari, but for NLP technology itself. An instant summary, in human quality, of a massive amount of information is another display of the awesome power of NLP.

“Primer has a world-class team dedicated to deploying NLP solutions that solve real-world problems. We needed a trusted partner who understands the nuances of language and would stand side by side with us to create a user-centric product that delights our customers. Primer absolutely delivered.”

Ed Shrager, CEO and Founder, Summari

Primer partnered with Vibrant Data Labs, a nonprofit that uses data to create the first-ever comprehensive map of climate change funding. Our interview with Eric Berlow of Vibrant Data Labs shows how ‘following the money’ reveals both our biggest opportunities and threats to turn climate change around.

It’s no secret that the future of our planet is in trouble. The recently reported IPCC report concluded that countries aren’t doing nearly enough to protect against significant disasters to come. 

But in order to solve a big problem like climate change, and to understand if our current response is working, we need to see where private funding in the sector is going. What issues are getting money, and which organizations are getting that funding? What other trends might emerge? 

Applying NLP to climate change

That’s where natural language processing comes in. Using Primer’s NLP technology, we partnered with Eric Berlow at Vibrant Data Labs to produce the first-ever climate change funding map. Primer’s Engines analyzed data on over 12,000 companies and nonprofits funded in the last five years. Using organizations’ descriptions provided by Crunchbase, grant applications provided by Candid, and in partnership with the Cisco Foundation, we generated one of the first-ever data-driven hierarchies of climate topics to better understand our current response, alongside any potential gaps. Using this topic hierarchy, we can see what projects organizations are working on – and where. That helps us see what’s missing in the bigger picture. And to solve a problem like climate change, a big picture view is what’s needed.

Watch our interview with Eric Berlow on why following the funding is crucial for the climate’s future. 



“The Coronavirus pandemic was like the trailer to the climate change movie where if the have-nots are really impacted, everybody gets affected by it. Climate change is one of those problems that is an all-hands on deck problem. You cannot just solve a little piece of it. “ – Eric Berlow



Learn more about Primer’s partnership with Vibrant Data Labs here, and learn the technical piece behind the work here and here.

For more information about Primer and to access product demos, contact Primer here.

From a security perspective, the Beijing Olympics has all the ingredients of a perfect storm. Global tensions are ratcheted up as the US and Russia negotiations about Ukraine’s fate hit a wall. The US and its allies have declared a diplomatic boycott of the Beijing Olympics due to alleged human rights violations by the Chinese government against the Uyghur population. These same dissidents are looking at the games as a platform for voicing their grievances. 

Adding to the tensions are reports that movements of people in the city, including players and media, are limited under COVID-19 protections.  The games themselves are expected to be more opaque than years prior as Beijing has limited foreign correspondents to a “closed loop” bubble. The correspondents are provided with limited interactions with athletes taking part in the games and no movement in the city itself. The athletes are under extra security protocols as officials warn of Chinese surveillance operations that will target them while in the country. 

Security analysts from global security operations centers (GSOCs) around the world with assets or people in Beijing during the games are on high alert. They need to monitor incoming information around the clock to anticipate any threats and provide instructions if any security incidents occur.

Surface key insights

For security teams charged with the herculean task of monitoring threats emanating from and against the Beijing Olympics, and its commercial sponsors, Primer Command® is a game changer.  Command not only identifies people and places mentioned, but it also shows live feeds of news and social media posts side by side streaming in. This saves analysts from using multiple apps — it’s all in a single pane. Further, this allows them to leverage news reports to corroborate, in seconds, social media posts with alleged threats and emerging issues of importance. With these capabilities in hand, organizations can maximize the safety of their people, operations, and assets. 


Primer Command automatically structures and summarizes millions of news and social media posts across 100 languages to surface key people, locations, organizations, topics, suspected disinformation, sentiment, and media to better understand how the situation is evolving. 


The following puts a spotlight on the power of Command’s ability to zero in on the information that matters most through advanced filters and AI-enabled classifiers. To learn more about Command’s other capabilities, click here for a free trial of Command for Tactical Insights.

Humanitarian aid filters

Command’s advanced filtering capabilities allow security teams and first responders to unlock mission-critical information during a crisis. Primer’s humanitarian aid filters include drilling down on tweets of displaced people and requests for evacuations, caution and advice, and reports of injuries or deceased people. These filters will be particularly operative during the winter games to zero-in on any violence and safety concerns for personnel there.

Chinese language filters

Learning what is being conveyed to the local population will be more important during the Beijing games given the limited movements allowed for foreign media outlets. Additionally, it will give security teams early indicators of unrest. This filter can also illuminate posts by local nationals – automatically translated – expressing concern about physical security threats.

Additional filters

Command can filter on numerous other entities to drill down into the information security analysts care about most.  

  • Event Types: Security analysts can filter the information feeds by the event types such as diplomacy, military, or law enforcement topics within news reports. This will prove to be of particular importance if any security incidents break-out during the winter games. Analysts will be able to home in on reporting related to law enforcement to get the latest actions to contain the threat. Focusing on these posts also provides GSOCs with the latest official statements and guidance by security forces for the people in the area. 
  • Disinformation: Command can detect and flag disputed information in news reports. Analysts can filter by disputed information and use this as an indicator of disinformation campaigns occurring during the events. 
  • Social Data: Analysts can segment social media data feeds based on the number of likes or retweets, key threat indicators, or even by sentiment. Primer’s state-of-the art sentiment filters are hand tuned for the highest accuracy so analysts can quickly identify the social media posts that matter. By filtering for negative sentiment, analysts can uncover the threats hidden within the deluge of data — separating chatter from hazards.

Learn more

Contact sales to discuss your specific needs. You can also stay connected on Linkedin and Twitter.

What is NLP and how can it support business?

Imagine a curious executive walking down a path that explores how Natural Language Processing (NLP) can help their business by uncovering insights hidden in the available and pertinent data to make better and more timely decisions. The executive is often faced with the problem of having vast amounts of company data, and not a lot of ways to take action on it. Not to mention the risk of not knowing what’s hiding in the data.

Steps one and two

As the journey begins, the executive takes steps one and two, Ideate and Identify. The executive asks “What do I want to know?” and follows up with “Where do I find the answers?” Whether it is customer attitudes toward a business, how it compares with competitors, or almost anything else decision-makers, analysts and owners would want to know, a quest for knowledge is the beginning. The next consideration is where to find answers to these questions. Identifying those data sources — internal purchase information, call center logs, product descriptions and reviews, social media posts, customer survey results, etc. — is the “where” that the answers will be found.

Steps three and four

The next steps, three and four, are Connect & Ingest and Transform, where the executive might ask, “How do I find the answers?” extracting text from both external sources and the company’s unstructured internal data mentioned in step two (Identify). In Transform, the executive asks, “How do I use NLP and AI?” focusing on Named Entity Recognition (NER), a sub-task of information extraction. It identifies and classifies named entities mentioned in unstructured data into predefined categories such as given and surnames, affiliated organizations, geographic locations, time expressions, etc. It also includes question & answer, classification, topic modeling, relationship extraction, sentiment analysis and other methods of processing the ingested information into something useful.


Steps five and six

Next up, Integrate and Explore, steps five and six, come after data has been scanned and processed. At the Integrate step, the executive could ask, “How do I combine insights from NLP and AI with my own data and analytical models?” To sharpen the results of NLP, companies often have pre-existing internal mathematical models, analytics and projections that can be combined. Once completed, the executive at the Explore step asks, “What answers do I have?” and looks at the patterns and unearthed relationships that can be converted into action plans.

Steps seven and eight

Operationalize and Realize & Repeat are steps seven and eight. Once the executive has answers from previous steps, the question is “How can I use this information?” The Operationalize step adds these new insights into a workflow. This can include replacing labor-intensive and often mundane tasks like manually compiling mass volumes of data with automation, contextual routing, summarized analysis, and creating intelligence dashboards. 

The last step is what keeps the process going, which seems counter-intuitive, but the executive learns it is a feature of NLP. Once new insights are put in place to realize achievable outcomes, this new data is used to expand on and repeat for further analysis that will result in a deeper understanding.

While these concepts require a basic understanding of NLP, the eight steps succinctly sum up the process. The executive has developed a better understanding of how NLP can positively impact their bottom line. 

Primer strives to help the world understand the power of NLP and what it can do to help businesses make better decisions and gain a competitive advantage. The “8 Steps to Get Started with NLP” is one of myriad efforts to pique interest, start conversations, and educate the business community.

For more information about Primer and to access product demos, contact Primer here.

Imagine logging into your work computer one Wednesday morning and seeing untrue social media posts claiming the company you work for is a fraud.  Simultaneously you and your colleagues receive a report from an unknown source presented as comprehensive research warning investors against the company because it is a fraud. Several weeks pass before it is discovered that the company who published the report is a shell with no discernible employees and operating from an unknown address across the world.  But why would they write this report? Was the entire company created just to spread false information about your employer? 


Unfortunately the story above is not made-up. It’s also becoming less of an anomaly, especially in the crypto industry. Spreading disinformation in the crypto industry is prevalent and persistent and it often intermingles with real investment concerns.  The promulgation of disinformation with fear, uncertainty, and doubt or FUD,  is intended to confuse investors and potential investors.  Questions around hot button issues can be made intentionally to illicit FUD in an effort to affect the associated token’s price and popularity. The concept of FUD has become so pervasive that crypto sector social media users will use “FUD” as a word to call attention to any posts that negatively portray a crypto project.

AI Machine Learning tools can help to detect disinformation campaigns

New advancements in AI/Machine Learning, specifically Natural Language Processing (NLP), can help detect disinformation and synthetic text as well as partition the claim and the counterclaim of a disinformation campaign. This allows crypto projects to quickly see what is being said on each side of a dispute. 

With Command crypto companies can see the disputed information flagged for each report in the feeds column. They can also get perspective on the amount of FUD they are facing compared to others in the space. Additionally, Command displays FUD trends overtime and categorizes the organizations and people discussed in the posts. This helps in conducting investigations into the targets of the post and who is behind the disinformation campaign.

How pervasive is FUD?

FUD around crypto projects tends to focus on what governments will do about it. This has largely stemmed from China’s decision to ban crypto transactions and mining. This FUD gets recirculated frequently as China reaffirms its decision or cracks down on underground mining noting concerns about energy use. Creating a recent spike in FUD claims is the intensifying scrutiny of blockchain assets by the Securities and Exchange Commission and other U.S. regulators


Disinformation peddlers, in the form of bots or paid influencers, tend to pile on top of these fears with statements like those in the image below.  This social media influencer is known by many in the crypto sector to consistently post negative information about Tether and Bitcoin. He used the press release to support his campaign against both companies. Notably, the statements referenced in the post never mentioned Bitcoin or Tether. They focused on the impact mass adoption of stablecoins would have on traditional financial markets.

Disinformation in the crypto sector tends to skyrocket with any downturn in the token price. Take Ethereum (token: ETH) as an example. The first chart below shows ETH price in December 2021. The second chart shows a spike in FUD statements at the end of December when the price of ETH had its most severe decline.

In looking at the results from a basic Twitter search for the terms “FUD” and any of the top 20 crypto companies over the month of December there are 254 hits. Likewise, for Reddit there were 71 hits. While these numbers might not be alarming it’s important to note that they are only scratching the surface. This is because when social media users post FUD they don’t usually flag the term. This search is most often capturing other users pointing out FUD in other posts. This search also doesn’t cover discussions in threads of posts.

FUD contributes to market volatility, brand bullying

One of the oft-cited reasons for not investing in crypto companies is because of volatility. In November 2021, for example, Beijing reiterated its stance against Bitcoin miners which likely contributed to a crypto selloff over the next several days. The price of Bitcoin dipped 2.9% and Ethereum and Solana dropped 4.6% and 6.7%, respectively, following the statements.

The crypto industry is largely unregulated. And the federal government, for the most part, appears to still be figuring out how it all works. Couple the lack of oversight with the fact that most people interested in this sector shy away from central authorities. As a result many of the victims of FUD do not see legal recourse as an option.

Instead of court battles, they have taken to relying on community advocates to counter the messaging. These are paid and unpaid influencers who are supposed to support the brand and raise awareness about new developments through social media and educational meet-ups. Ripple has its XRPArmy, Chainlink has LINKMarines, and Dogecoin has the DOGEArmy, just to name a few. 

Yet more often these advocates are needed to focus on identifying and squashing false information directed at the brand. Because these are people financially invested in the company they take it too far and can contribute to brand degradation by attacking anyone questioning the project. Thus putting them directly at odds with their original purpose. 

The XRP Army, for example, is known for its scale and organization. If someone posts FUD about Ripple/XRP a foot soldier will spot the tweet and rally the troops by tagging the #XRPArmy. Next a flood of accounts will “brigade” the alleged FUD-monger, posting dozens or hundreds of comments. The attack comes in the form of an inundation of thousands and thousands of angry notifications that lasts for days.

Originators of FUD campaigns are difficult to identify

FUD campaigns are often hard to trace back to the originator because they will use fake companies and bots to amplify their message. And the cost of using bots to synthetically amplify content is relatively cheap. The New York Times in 2018 found that 1,000 high-quality, English-language bots with photos costs a little more than a dollar. See the possible bots intermixed with human posts below intensifying questions about whether it is time to sell Cardano’s ADA token below.

New synthetic text capabilities will make FUD campaigns even harder to trace

Bots are often detectable because they are posting the same message over and over. When you look at a bots profile they often have ‘tells’ such as imperfect use of the language, appear to have a singular theme to their posts, and have numerous bot followers. 

But these ‘tells’ are going to get increasingly difficult to identify with recent advancements in synthetic text generation. Last March researchers in the U.S. open sourced GPT-NEO, for the first time, making available to the public a next-generation language model. With the advent of these new generation language models launching a FUD campaign to try to drag down a competitor’s brand or for a short campaign will be even more difficult to detect. In fact, last summer, ​a team of disinformation experts demonstrated how effectively these algorithms could be used to mislead and misinform. The results are detailed in this WIRED article and suggest that it could amplify some forms of deception that would be especially difficult to spot.

Primer’s NLP Engines can help detect synthetic text and disinformation

Rather than continuing to invest in defensive armies or influencers to detect and flag FUD peddlers, the crypto space could benefit from an automated solution leveraging AI. Primer Command does all of this. Command ingests news and social media feeds and automatically detects, flags, and displays the origin and source of disputed information. This enables users to understand its provenance and evaluate its accuracy. This additional context also provides early warning and a means to constantly monitor the information landscape.

Command can also classify text as likely to have been written by a machine. It does this by automatically evaluating 20 different user signals to flag automated or inauthentic social media accounts.  Signals include how quickly an account is gaining followers, how many accounts they’re following, the age of the account, and even the composition of the account name. This information would allow crypto companies to evaluate the posts’ accuracy.

These tools hold more promise than manual efforts because they are impartial, within the parameters of how they are designed. It is an algorithm that identifies the FUD instead of someone with a stake in the project’s success. This is critical to assist in neutralizing the adversary without stoking the flames with numerous negative posts. By automating the identification of FUD campaigns, the project’s community can get back to focusing on brand promotion and education.  

Learn More

For a free trial of Primer Command or to learn more about Primer’s technology, Ask for Demo or Contact sales  to discuss your specific needs. You can also stay connected on Linkedin and Twitter.

“We create the tools behind the decisions that change the world. ©2022 Primer”

How mapping bottom-up climate action can drive more strategic climate solution responses and help us adapt thoughtfully. 

At the recent COP26 climate summit, a Minister from the Pacific island country of Tuvalu announced that it would be seeking to retain legal status as a country even if its entire territory were to become submerged [Reuters]. He was standing thigh-deep in the ocean in an area that used to be dry land. His speech made it clear—the effects of climate change are here today.

When it comes to tackling the climate crisis, we typically think about solar power, electric vehicles, and carbon capture to mitigate future climate change. But Tuvalu’s story is the tip of the iceberg of climate adaptation — the messier, less-defined problem of how human civilization will respond to the changes that are here now and predicted to accelerate over the next 30 yrs — even if mitigation efforts are successful.

In order to help define this messy space, Primer recently partnered with Vibrant Data Labs, a social impact data science group, to make sense of this broader and more diverse climate landscape. Crunchbase and Candid provided data on over 12,000 companies and nonprofits funded in the past 5 years that are addressing climate-related topics. Primer’s natural language processing (NLP) engines mined these organizations’ descriptions to generate one of the first-ever, data-driven conceptual hierarchy of topics to better understand the shape of our current response, and its potential gaps. This unique perspective comes bottom-up from how the private and social sector organizations on the ground describe what they do — not by what is most spoken about in the news or social media.

Our analysis suggests that while new technologies are emerging to address climate mitigation, existing organizations that have historically tackled structural inequities (e.g, gender equity, migrant rights, homelessness) are uniquely poised to address climate adaptation challenges which permeate every aspect of civil society. Our sample showed these organizations are beginning to add a climate lens to their work on diverse social issues.

Defining the Climate Space

We created a hierarchy of interrelated topics based on the company descriptions. Using this hierarchy, we are able to surface the broad topics in climate work and also drill down into specifics. 

Examining the topics in this way revealed there are two major branches: one dealing with topics related to preventative technologies (Environment And Renewable Energy) and the other with topics addressing the human impact of change (Public And Social Benefits).  This computational technique led to a close split between mitigation and adaptation. It’s exciting that our method could organize these topics in a way that gets close to how a human would do the task.

The topics underneath our mitigation branch (Environment and Renewable Energy), are what one may expect: “Water”, “Nature and Conservation”, and “Energy, Storage, and Energy Waste”. Adaptation work is much more varied and therefore harder to define. Our analysis can help us paint a crisper image of this emerging landscape.

This topic hierarchy shows organizational distribution on climate change topics, with higher convergence at the top vs the bottom.

Climate Change as a Social Issue

The Intergovernmental Panel on Climate Change (IPCC) defines adaptation as “the process of adjustment to actual or expected climate and its effects” [IPCC]. Using this definition, we see some top level branches that are climate adaptation. As the earth warms and extreme weather becomes the new normal, Disaster Relief and Preparedness will be critical to serving areas affected.

A less obvious topic might be Community And Neighborhood Development. The subtopics within it seem like quite standard areas pertaining to social issues such as: Health, Affordable Housing, Human Rights, Government Advocacy, and Gender Equality. Looking deeper into the language of these organizations we can see how they are incorporating a climate lens to their work.  For example, here is the description of one of the organizations working in Gender Equality:

MADRE is an international women’s human rights organization founded 37 years ago.  We partner with grassroots women to create social change in contexts of war, disaster and injustice. To advance women’s human rights, MADRE supports grassroots women to meet their basic needs as a bridge to develop their advocacy and political participation….

…. Our actions to confront climate change today will decide the futures of our planet and of generations to come. You can join the women leading the way.    Climate change is a global threat, and poor, rural and Indigenous women are hardest hit. They are impacted first and worst by the food shortages, droughts, floods and diseases linked to this growing danger. But they are more than victims. They are sources of solutions, inventing innovative, locally-rooted responses.    Through our Women Climate Defenders initiative, MADRE’s grassroots partners enable their communities to adapt to climate change. They build clean water systems to guard against drought and seed banks to preserve future harvests.

This is an example of how an organization that has been addressing women’s human rights for 37 years can contribute today to building climate resilience in the most vulnerable communities. It also highlights how climate adaptation requires addressing diverse, interdependent topics.

We can dive deeper into the Gender Equality data to understand the key topics that organizations in this field are working on today. A quick glance at this chart shows a wide range and diversity of topics in the climate adaptation cohort, from Human Rights to Infrastructure to Youth Organizing and Antidiscrimination.

The topics which co-occur most frequently with gender equality cover a range of socially minded topics which are not all tightly related to gender equality.

Let’s compare it to a topic from our climate mitigation set, Nature Conservation and Environment

The topics which co-occur most frequently with nature conservation and environment are very conceptually similar and mostly related to climate mitigation.

Organizations in this cohort work on Sustainability, Renewables, Water Conservation, Sustainable Agriculture, and Wildlife Preservation. It seems that most of these issues are more proximate to each other.

To further peel back the layers on current climate solutions, let’s take a deeper look at “crowding” or “spreading” of focus areas by organization. With NLP, we can approximately measure organizational “topic coherence” which tells us if a given organization optimizes on breadth or depth, and exactly how far apart the topics are within that cohort. We created a score from 0 to 1 that calculates how similar an organization’s topics are to each other—we call this the “organization focus score”. Organizations that focus on a narrow set of topics will have scores closer to 0. We can then extrapolate to the topic level to measure how narrowly focused the organizations in each topic are. When we plot this out from 0 to 1, we see topics relating to climate adaptation (Public and Social Benefit) are being addressed by organizations that are more broadly focused than the organizations addressing climate mitigation (Environment and Renewable Energy) topics.

The topic coherence score measures how closely related the topics a given organization works in are to each other. A score of 0 closer to 0 indicates the topics are very similar and a score closer to 1 means they are all very dissimilar. Climate adaptation topics (Public and Social Benefit) contain organizations with a more diverse set of focus areas than the climate mitigation topics.

Our analysis reveals that, while an organization working on mitigation will typically be working on a single, defined solution, organizations working on climate adaptation are fighting on multiple fronts.  


“In an interconnected world, it is exactly this messiness that funders need to embrace”, says Vibrant Data Labs’s Eric Berlow. “Traditional venture capital tends to fund focused, scalable solutions, with easy-to-measure outcomes, like renewable energy. But the climate crisis is an ‘all hands on deck problem’. Winning on one corner of the problem is an important piece; but if structural and systemic inequities in climate adaptation are not addressed, like the people of Tuvalu above, we all lose. We all feel the climate impacts of supply chain shocks, forced migration, civil unrest, and war. The most recent IPCC report suggests these trends will exacerbate over the next 30 years even if renewables and carbon capture solutions are successful. Climate funders will have to adopt a more holistic multi-pronged approach to rise to this challenge.”

Conclusion

As climate change becomes more and more a central part of our lives, understanding the landscape of solutions and providers gives us perspective on the magnitude of the space. We used NLP to analyze the work of over 12,000 companies to better understand where private and public organizations were focusing their efforts. In doing so, we highlighted the broad set of topics that are climate related and illustrated that many organizations working across the diverse social sector are now adding climate solutions to their efforts to enhance equity and resilience.

In a coming post, we will present our partner, Vibrant Data Labs’ story in which they take this analysis a step further to highlight the solution areas that are receiving the most funding.

Natural language processing (NLP) automates the work that would previously have required hundreds of researchers. These machines read and write – and they’re changing the future of business.

With NLP, we can now analyze information at machine speed, but with human-level precision. As reading and writing is automated through machine learning, companies are freed up to focus on the challenges that are uniquely human.

Read more here

Despite economic instability caused by the pandemic, a surge of workers began voluntarily quitting their jobs, with attrition rates hitting record numbers by Spring 2021. This “Great Resignation” provides a once-in-a-generation opportunity to attract and retain new talent, but if companies don’t take proactive measures, they’ll see some of their best people leave, resulting in significant disruption to the business.

How do you make sense of employee feedback at scale? Primer’s advanced natural language processing (NLP) platform enables HR teams to structure and make sense of text-based employee feedback, so they can unlock buried insights to help them better engage and retain their workforce.

Read more here

Large organizations rely on hundreds, or even thousands, of vendors, partners, and supply chain businesses to provide key products and services. Primer’s advanced natural language processing (NLP) platform enables compliance teams to stay up-to-date on a large number of vendors and quickly surface business-critical information.

Read more here

A Deluge of Information

When the entire world’s attention shifted to COVID-19 in 2020, suddenly the topic of scientific research became important to just about everyone.


On January 31st, 2020, a preprint* titled “’Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag” was published on the platform bioXiv. The paper claimed to find similarities between the protein structures of the COVID-19 and HIV proteins. The authors quickly retracted it due to swift criticism from the scientific community. While there was almost no mention of this research in the news, it received an incredible amount of attention on Twitter, going viral, and gaining momentum despite its lack of rigor.

Within four hours, the paper had been shared over 200 times on Twitter and before the day was over, it had been shared over 30K times.

The retracted preprint went viral within hours of publication.

Despite the retraction, the paper continues to be propagated through Twitter — in September 2021, more than 18 months after it was published, it received 200 shares.

Despite its retraction, the paper continues to be shared on Twitter.

As the COVID case rate and death rate rose exponentially, so did the amount of work being published. By the end of February 2020, 612 papers had been published, and by the end of March, the number was more than 2,800. Tens of thousands more would follow. 

Cumulative research graph through time.
COVID-19 research grew exponentially for the first months of the pandemic and has continued to grow linearly.

At Primer, we build NLP technology that helps humans make sense of massive unstructured data sources (like scientific research, for example) so we knew we could be a resource to the frontline workers who were fighting to contain the pandemic and save lives. So on April 6, 2020, we launched COVID-19 Primer, a comprehensive information-tracking site to help researchers, frontline workers, patients, and others make sense of the flood of information pouring in from experts around the world.

For example, Madeline Grade, an emergency-medicine physician and researcher at the University of California, San Francisco, used the site early on in the pandemic when “every aspect of care was changing on a daily basis.” Grade was inundated with information and needed to create daily protocol updates for the university’s hospital. “Amid that chaos,” she says, “the Primer app was actually a really amazing way to cut through the noise” [Nature.com].

Another frontline worker, Dr Zev Waldman, a clinician developing guidelines for care of Covid-19 patients at his hospital, used the site to search COVID literature – both preprint and published work – for relevant pediatric subtopics. Accessing the most discussed papers allowed him to get a quick sense of what is new, trending and potentially relevant, as was the daily briefing, a summarization of the most important new papers.

Understanding 18 Months of Research

With NLP, we are able to structure hundreds of thousands of documents and aggregate information you care most about to a more digestible level. One NLP technique is called topic modeling. The COVID-19 Primer site extracts topics and maps the papers to each topic. Through this lens, we can see the story of the pandemic through the rise and fall of topical research over time.

Early Focus

In the very early days of the pandemic, the majority of the focus was centered around understanding the biology of the virus and its epidemiology. By April 2020, many large cities in the US and around the world had imposed lockdown measures, an indication that research with the topic “Model of Epidemic and Control the Spread” was reaching policy makers. Soon after, the share of research dedicated to the disease’s spread and genetic make-up dropped off. It’s important to note that the large percentage of research in the early days dedicated to these topics may have been influenced by previous research into epidemic spread and spike proteins, making it more accessible, as well as the relatively low volume of all research compared to just a few months later.

Early research was disproportionally focused on understanding the disease’s makeup and epidemiology.

Emerging Research

As the world began to realize the pandemic would not end quickly and social distancing and school closures continued into the summer and new school year, more research was dedicated to mental health issues. Similarly, vaccine research ramped up steadily through 2020 and into 2021. As the vaccines became widely accessible, research into vaccine hesitancy also increased.

Research into vaccines and vaccine hesitancy has grown as the pandemic has continued.

Steady Players

Two topics of research that held steady as a proportion of all COVID-19 research were around public health and testing.

Research dealing with public health and testing has stayed steady throughout the pandemic.

Here are the six in relation to each other as well as their absolute volumes. Unshown are the 50 plus other topics of COVID19 research from the past two years.

Taken together, we can see the shift in focus of COVID19 research topics over time.

Our Work Ahead

While cases in the United States and Europe are decreasing, the COVID-19 pandemic is far from over. Primer will continue to serve our researchers and front line workers by maintaining the COVID-19 Primer site as long as it is needed. We welcome inquiries from researchers, journalists, and educators for access to this data and using Primer’s platform to quickly build custom models to track emerging topics. Disseminating scientific research is only one application of NLP. If you have large amounts of text data that needs structuring and want to learn more about NLP, please contact covid19@primer.ai.

*Preprints are preliminary reports of work that have not been certified by peer review and published in a journal. They provide a mechanism for rapidly communicating research with the scientific community.

A beginner’s guide to natural language processing, a tool that will change business as we know it.

Imagine being able to detect disinformation bots as soon as they begin spreading false information on social media. Or get in front of negative reviews about an important product launch before it becomes a PR problem. Or know how to best deploy humanitarian efforts as a crisis unfolds.

All of these problems are solvable with natural language processing, or NLP, the machine-learning technology that Primer deploys today. While NLP is just one type of artificial intelligence, its value has broad implications across government and businesses alike. 

What is natural language processing?

Natural language processing, shortened to NLP, is the science of building machines that can read and write similar to how humans do. The goal of NLP is to organize massive amounts of text data (think: millions of documents) into information so humans can use it to solve problems. Since machines can perform tasks much faster than humans, we can work with machines to read massive amounts of text in a fraction of the time. While NLP is just one type of artificial intelligence that is becoming important for analysts to understand, it’s a critical piece to solving lots of problems. 

Right now there’s a huge gap between the amount of information we’re analyzing, and the amount of information that we could use to bring deeper meaning and understanding to our world. At Primer, we call this gap the intelligence gap, and it’s profoundly important for understanding the value of NLP.

The intelligence gap

The amount of data we are collecting globally is growing exponentially. And while that’s happening, the number of human analysts is only growing linearly – in other words, we as humans simply can’t keep up. 

In reality, this trove of unstructured data is so vast, we don’t even know what we don’t know. In other words, we are looking to speed up the analysis we’re already doing and find the “unknown” unknowns that have the potential to give us an edge or change our worldview. NLP helps us create structure within high volumes of unstructured data. That means we can now automate analysis and find information that we didn’t even know we were looking for.

We can’t hire ourselves out of the intelligence gap, so we need new ways to close it. NLP has become the best tool to help us bridge the divide between the information available now and the information we actually use. With NLP, we can now analyze information at machine speed, but with human-level precision.


Closing the intelligence gap is precisely why Primer exists. In short, without unlimited time and human capital, information that can make or break a business – or inform consequential policy or operations – is simply lost. 

The implications of closing this gap cannot be underestimated. Putting NLP in the hands of analysts, decision-makers and operators gives them the critical capabilities they need to advance government missions, the global economy, our financial system, and more.

The problem with unstructured data

Does your organization or company have piles of unstructured, text-based data? What might you do with that information if you could surface or summarize exactly what you’d want to know? Companies and organizations alike are realizing that they are missing critical information and that they cannot successfully operate or compete in this new world without having this data at their fingertips. Primer makes it easy for companies and organizations to leverage their idle piles of data and use it to inform critical decision-making and action, without needing a team of data scientists to do so. The majority of the world’s most valuable information – up to 80% according to Deloitte – sits idle in the form of unstructured, text-based data.


Think of NLP as a resource to create a summary of summaries – from a massive number of sources – all with human-level accuracy. Or a way not just to find the needle in the haystack, but to identify the next haystack. For example, Department of Defense analysts use Primer’s NLP to look through 2,500 intelligence reports in the same time it would normally take to read and process just one report. Walmart uses Primer to analyze consumer insights in a fraction of the time it would take a human to do the same thing. And National Security organizations rely on Primer’s NLP Platform to reveal previously unknown patterns and connections between people, organizations, and locations, all which helps them detect disinformation activity faster. NLP can also help track, analyze, and summarize evolving situations – such as those unfolding in Afghanistan, the changing nature of COVID-19, or natural disasters such as wildfires, hurricanes, and floods.

Why NLP matters

Imagine you’re a retail company that has thousands of reviews pouring into your platform by the hour. NLP could help you stay ahead of those reviews by generating insights you previously couldn’t even find manually, and in a fraction of the time. NLP sifts through vast volumes of data to: eliminate duplicate information; extract key entities, such as people, locations, and organizations; cluster related news stories and images; conduct trend, sentiment, and topic analysis; and detect suspected disinformation – all in near-real time. 

For example, Primer worked with Walmart to track and analyze mentions of the company in social media, allowing Walmart to track and respond in real time to customer feedback, sentiment trends, and inauthentic tweets from bot accounts.

Within NLP, financial analysts can analyze a portfolio company’s regulatory filings (e.g. 10Ks, 10Qs) from the last decade, or process thousands of pages of earnings call transcripts. This information could then be used to assess evolving risk, make investment decisions, or detect other information that’s critical to your investment or business. Investigative journalists could crunch the data on thousands of public documents to identify people, organizations, and locations, accelerating time to insight and potentially discover the next breaking news story. Or analysts could scan millions of documents a day for every mention of companies based in South East Asia falling victim to cyber attacks. The applications truly span every industry due to the foundational speed at which information is synthesized. 

In the future, NLP will allow you to ask questions of your data just as easily, and in the same words, as you would ask a human expert with an analysis turnaround measured in seconds, not months.

We’re at the pivotal breakthrough moment of NLP, and it’s incredible to see the ubiquitous applications of the technology across business and government. As a leader in NLP, Primer makes implementing this technology easy, fast, and scalable with its next-generation NLP Platform. If you’d like to learn more about what we’re up to and what’s next in NLP, request a demo here.

What are the biggest developments in artificial intelligence and machine learning that will impact national security? Why is disinformation so hard to counter? Brian Raymond, Vice President of Government at Primer discussed these questions and others with Terry Pattar from the Janes World of Intelligence podcast. Check out our podcast recap for the key takeaways from this lively discussion and check out the full podcast here.



Screen Shot 2021-04-15 at 11.13.25 AM

This past week, Primer VP Brian Raymond met with Terry Pattar, head of the Janes Intelligence Unit, to talk about the trends and challenges he’s seeing at the nexus of machine learning and national security. During the conversation, Brian and Terry delved deep into the institutional and technical challenges faced by the US and its allies in countering disinformation, and ways to overcome these. For background, misinformation refers to false or out-of-context information that is presented as fact regardless of an intent to deceive. Disinformation is generally considered more problematic because it is false information that is designed to manipulate perceptions, inflict harm, influence politics, and intensify social conflict.

Disinformation spreads like wildfire

With emerging machine learning technologies, it’s become increasingly easy for our adversaries to spread sophisticated disinformation to advance their strategic objectives and undermine our democratic institutions. As the cost of spreading disinformation has plummeted; our adversaries have proven to wield this weapon efficiently and at scale. It is an effective asymmetric tool against the West because it actively undermines a shared understanding and truth, and contrary to our societal underpinnings, our adversaries take no issue with putting the full force of the state behind it.

“It is orders of magnitude cheaper to pollute the information environment with falsehoods than it is to find whatever has been put into the information environment that’s polluting it and to counter it.”

– Brian Raymond

Detection is expected to get increasingly difficult

Over the past several years, detecting botnets creating algorithmically generated disinformation relied on “tells” in the text content. However, last June, the AI research lab OpenAI trained a large parameter, deep learning model called GPT-3 capable of generating text as fluently as a human, ranging from tweets to long-form articles. While GPT-3 has not been released to the public, a research consortium is working to release an open source version called GPT-NEO, steadily releasing increasingly powerful versions of this model. GPT-NEO is but the first of a wave of powerful open source “transformer” language models that can be easily deployed to generate effectively limitless streams of high quality synthetic text on demand.

“And so that’s why getting into the message of what’s being conveyed and understanding it from a natural language understanding perspective is going to be absolutely critical because they won’t have tells that you can pick up on.”

– Brian Raymond

Sidestepping this problem, Primer’s Natural Language Processing (NLP) Platform can automatically build a sourced and referenced knowledge base of events, entities and relationships, to understand new claims that are catching hold, which groups are pushing them, and what audiences are receptive to. This additional context can provide early warnings and a means to constantly surveil the information manipulation landscape and better position operators to quickly contrast the claims that are being made against a ground truth knowledge base. (See this WIRED article or this Defense One article for more details about how Primer is supporting US national security efforts to counter disinformation.)

We solved it once, we can solve it again

Disinformation in and of itself, as well as the societal discord it can sow, is nothing new. It has been a national security concern for decades; the Cold War was largely waged by propagating competing versions of the truth. In the early 1980s, the Active Measures Working Group was established in the US to identify and counter Soviet propaganda. The effort was effective, bringing Mikhail Gorbachev to the table who subsequently ordered the KGB to scale down their disinformation efforts. After the fall of the Soviet Union, that playbook was shelved until Russia annexed Crimea in 2014. Unfortunately though the world has changed and that same template won’t work today. Information moves at real-time speed, and is far more diffuse today than the old-school playbook. Counter disinformation today requires an entirely different toolkit.

We can’t hire ourselves out of this problem

One study that shows the extent of this challenge was published by TechCrunch, which looked at the volume a typical intelligence analyst covers on a daily basis. The study found that in 1995 that analyst would have to read about 20,000 words a day to stay aware. And by 2015, that had increased tenfold. By 2025, they expected that to be in the millions of words per day. Likewise, a study by IDC titled Data Age 2025, predicted that worldwide data creation will grow to 163 zettabytes by 2025; ten times the amount of data produced in 2017.

The defense and intelligence community recognizes that it can’t hire its way out of this problem. This is why it’s so critical to pair operators and analysts with algorithms to accelerate rote work and to help them uncover connections and insights buried in the data. Analysts can use Primer’s products to automate the organization of that intelligence into knowledge bases that cluster, curate, and organize reporting into specific areas of interest. These knowledge bases can be further automated to continuously analyze and self-update with new intelligence reports.
This enables them to spend less time on rote, manual tasks and instead spend more time getting at the why, so what, and what next type questions. The analyst can then be positioned to deliver a deep, rich brief that serves the policymakers’ needs.

Bridging public and private partnerships is key to any solution

What we need to do now is step back and take a fresh look at how to counter disinformation. To seize the advantage in this asymmetric space, we need a comprehensive approach: we need both the US, its allies, and the private sector to come together. SOCOM, the US Air Force, and the intelligence and defense communities more broadly have recognized this and have made big investments in data, data curation, infrastructure like a classified clouds, and the NLP applications to process the enormous volumes of data each day across units, organizations, services, and allied partners.

On the technology side, Primer has been steadily developing not only NLP Engines that perform at levels of a human analyst but entire machine learning pipelines that nest within analyst and operator workflows. We have developed a no-code AI model labeling, training, and deployment platform that can be run by operators on the frontlines confronting disinformation and other challenges every day. Operators can now encode their tacit knowledge within the machine and essentially teach it to understand their world; no technical skills needed.

Primer is effectively reducing the friction to using AI in the operational context. With a shared infrastructure to share large volumes of data, the US and its allies will have the tools to rapidly build and retrain best-in-class NLP models to support tactical efforts.

“I think that’s going to be the big tipping point here, [..] that when we make it easy enough and fast enough for the folks that are on the front lines to encode their tacit knowledge into these models and use them and lower those costs, that’s when we’re going to see a paradigm shift and really how we’re pairing algorithms with analysts and operators.”

– Brian Raymond

There is an eagerness for AI-enabled efforts

It’s clear that there is an appetite and high-level support across the intelligence and defense services to get their analysts and operators the AI-enabled and AI-driven tools they need to confront these challenges. Significant organizational issues have yet to be overcome to hit the ground running, but once the infrastructure is in place we will see the adoption and integration of AI solutions accelerate and proliferate across a range of national security applications. Investments in these areas will become more urgent as near peer and other geostrategic competitors exploit asymmetric strategies to weaken the West and advance their foreign policy objectives.

Learn More

To learn more about Primer’s technology, download the AI Technology report or contact us to discuss your specific needs.

If the U.S. government wants to win the information wars, Cold War-era tactics won’t cut it anymore.

By Brian Raymond (Originally printed in Foreign Policy on October 15, 2020)

On Oct. 14, Facebook and Twitter made the decision to remove a dubious New York Post story from their platforms—provoking heated debate in the internet’s various echo chambers. The article in question purportedly revealed influence peddling by Democratic presidential nominee Joe Biden’s son Hunter Biden, and the social media giants suspected that the uncorroborated claims were based on hacked or fabricated correspondences. Weeks before the U.S. presidential election, Silicon Valley’s swift and decisive action in response to disinformation is a stark contrast to its handling of hacked emails from Hillary Clinton’s presidential campaign four years ago.

A week prior, on Oct. 7, the U.S. Justice Department announced that it had seized nearly 100 websites linked to Iran’s Islamic Revolutionary Guard Corps (IRGC). These sites had been engaged in a global disinformation campaign, targeting audiences from the United States to Southeast Asia with pro-Iranian propaganda. But it wasn’t just the government engaged in countering adversaries online: One day later, Facebook and Twitter reported that they had taken down more than a dozen disinformation networks used by political and state-backed groups in Iran, Russia, Cuba, Saudi Arabia, and Thailand.

In the grand scheme of things, the events of Oct. 7 and 14 were hardly noteworthy. In recent years, private and public actors alike have had to ramp up their efforts against botnets, troll farms, and artificial intelligence systems that seek to manipulate the online information environment and advance certain strategic objectives. These actors came under unprecedented scrutiny in the aftermath of the 2016 U.S. presidential election.

The United States continues to rely on the same dated playbook that led to success against Soviet propaganda operations.

But while cyberspace may be a new front in the fight against disinformation, disinformation in and of itself—as well as the societal discord it can sow—has been a national security concern for decades; the Cold War was largely waged by propagating competing versions of the truth.

And much as the threat of “fake news” is nothing new, so too is the way policymakers deal with it—or try to.

Therein lies the real problem. In countering disinformation emanating from the Kremlin, Chinese Communist Party (CCP), and IRGC, among others, the United States continues to rely on the same dated playbook that led to success against Soviet propaganda operations, known as “active measures,” in the 1980s. But this anti-disinformation strategy, like most else developed in the 1980s, has been rendered largely obsolete by an evolving media landscape and emerging technology.

Now, if the United States is going to have any hope of getting back on its front foot—and put a stop to adversaries’ attempts to sow confusion and cynicism domestically—it’s going to have to seriously reconceive its old playbook. But that can’t be done without Big Tech companies, which are the linchpin in the fight against disinformation.

Granted, some state-citizen reconciliation is needed to mend the fraught ties of the post-Snowden era. In 2013, the whistleblower Edward Snowden leaked documents exposing widespread cooperation between U.S. technology companies and the National Security Agency, triggering widespread backlash from technology companies and the public, who lamented the lack of personal privacy protections on the internet.

Since then, the chasm between Silicon Valley and the U.S. national security community has only widened—but there are signs that the tide may be shifting: Companies like Facebook, Twitter, and Google are increasingly working with U.S. defense agencies to educate future software engineers, cybersecurity experts, and scientists. Eventually, once public-private trust is fully restored, the U.S. government and Silicon Valley can forge a united front in order to effectively take on fake news.

Disinformation crept onto the national security radar just as Ronald Reagan assumed the presidency in early 1981. After the CIA was publicly disgraced during the Church Committee hearings—which exposed the CIA’s controversial (and in some cases illegal) intelligence gathering and covert action against foreign leaders and U.S. citizens alike—Reagan recruited William Casey to revamp the agency. On moving into his seventh-floor office at Langley, Casey, known to be a hawk, was dismayed to learn that the CIA was collecting almost no information on Soviet active measures—and doing even less to counter them.

Casey reorganized key offices within the CIA’s Directorate of Intelligence to focus on better understanding Soviet active measures and instructed the Directorate of Operations to ramp up its collection of classified intelligence on Soviet propaganda. By mid-1981, the scale of the Soviets’ efforts became clear. In an August 1981 speech on Soviet disinformation campaigns against NATO, Reagan revealed that the Soviet Union had spent around $100 million to sow confusion in Western Europe after NATO developed the neutron warhead in 1979.

Of Moscow’s latest efforts, Reagan said he didn’t “know how much they’re spending now, but they’re starting the same kind of propaganda drive,” which included funding front groups, manipulating media, engaging in forgery, and buying agents of influence. In 1983, for example, Patriot, a pro-Soviet Indian newspaper, released a story claiming that the U.S. military had created HIV and released it as a biological weapon. Over the next four years, the story was republished dozens of times and rebroadcast in over 80 countries and 30 languages.

By 1982, the CIA estimated that Moscow was spending

3billionto3 billion to

4 billion annually on global propaganda efforts. The Soviet Politburo and Secretariat of the Communist Party, which directed the active measures, made no major distinction between covert action and diplomacy; to the Kremlin, disinformation was a tool to advance the strategic goals of the Soviet Union in its competition with the West.

With the nation fixated on Soviet propaganda, senior leaders from across the Reagan administration came together to form what came to be called the Active Measures Working Group. Led by the State Department—and including representatives from the CIA, FBI, Defense Intelligence Agency, and Defense and Justice departments—the national security bureaucracy quickly went on the offensive. Through the end of the Cold War, the group was effective not only in raising global awareness of Soviet propaganda efforts but also in undermining their efficacy. In fact, U.S. anti-disinformation campaigns were so successful that Soviet premier Mikhail Gorbachev in 1987 instructed the KGB to scale back its propaganda operations.

U.S. anti-disinformation campaigns were so successful that Soviet premier Mikhail Gorbachev in 1987 instructed the KGB to scale back its propaganda operations.

Clearly, those days are long gone. In stark contrast to the triumphs of the 1980s, the United States since the turn of the century has largely failed to counter disinformation campaigns by geostrategic competitors like Russia, China, and Iran.

The opening salvo of a new, digitized phase of state-level competition for influence occurred in 2014, when Russia seized Crimea from Ukraine. As he moved troops to the strategic Black Sea outpost, Russian President Vladimir Putin publicly claimed that those forces occupying Crimea could not possibly be Russian special forces—lying outright to the global community. In the years since, the Kremlin’s disinformation campaigns have increased in volume, velocity, and variety. Today, state-level actors such as Russia, China, Cuba, Saudi Arabia, North Korea, and others employ armies of trolls and bots to flood the internet with false, misleading, or conspiratorial content to undermine Western democracy.

If Washington is still fighting the same enemy, then what went wrong?

The United States’ counter-disinformation playbook has been predicated on two unspoken assumptions, neither of which is valid today: first, that shining light on lies and disinformation through official government communications is an effective tactic; and second, that Washington can keep up with the speed and scale of disinformation campaigns. In fact, debunking efforts by government officials do little to discredit propaganda, and the volume of threats vastly exceeds the U.S. government’s ability to identify and counter them. These inferences take U.S. credibility—and technological prowess—for granted, which is hardly inevitable.

Broadly speaking, three factors have changed the disinformation game since the 1980s—and rendered the assumptions that formed the bedrock of the United States’ campaign against Soviet active measures obsolete. First, the global media environment has become far more complex. Whereas in the 1980s most citizens consumed their news from a handful of print and broadcast news outlets, today, world events are covered instantaneously by a tapestry of outlets—including social media, cable news, and traditional news channels and publications.

Second, U.S. adversaries have relied on bots to amplify fringe content and employed trolls to generate fake content to advance their strategic objectives. Finally, rising political polarization has accelerated consumers’ drive toward partisan echo chambers while increasing their suspicion of government leaders and expert voices. Against such a backdrop, the Active Measures Working Group—a relic of simpler times—can no longer be successful.

Indeed, in the early days of the coronavirus pandemic, U.S. efforts to stem Chinese disinformation about COVID-19 backfired; Beijing’s disinformation campaigns accelerated between March and May. By June, Twitter reported that it had removed 23,750 accounts created by the Chinese government to criticize protests in Hong Kong and to extol the CCP’s response to COVID-19.

To complicate matters further, the one anti-disinformation campaign where the United States has been successful in recent years is hardly a generalizable case. The U.S.-led Operation Gallant Phoenix, fighting the Islamic State, was able to steadily erode the group’s legitimacy by undermining its propaganda machine. From a multinational headquarters in Jordan, the coalition flooded the internet with anti-Islamic State content and hobbled the group’s ability to broadcast its message globally.

But a campaign against the Islamic State is far from a viable blueprint for countering Russian, Chinese, and Iranian disinformation campaigns. The international community—private sector tech firms included—shares the broad consensus that the Islamic State must be defeated. This sort of political harmony hardly exists, for example, on how, or whether, to forcefully counter Chinese-led disinformation efforts related to COVID-19.

It’s clear that the United States is losing the information wars, in part due to a lack of innovation among the key stakeholders in the executive branch.

It’s clear that the United States is losing the information wars, in part due to a lack of innovation among the key stakeholders in the executive branch.

But not all is lost. The next administration can make the United States a viable competitor in the global information wars by developing a comprehensive counter-disinformation strategy that is predicated on three different pillars.

Before any decisive counter-disinformation strategy can be formulated, key constituencies will need to come to some sort of consensus about data ethics. A commission staffed by leaders from the executive branch and media organizations must first draft a set of first principles for how data should be treated in an open and fair society; philosophical rifts like those between Twitter CEO Jack Dorsey and Facebook CEO Mark Zuckerberg over the role of speech need to be overcome. Any effective campaign in pursuit of the truth requires a set of guiding principles to inform the types of speech should be permitted in digital town squares and when speech should be fact-checked—or, in extreme cases, removed entirely.

Once first principles are established, the White House can erect a policy framework to guide defensive actions and appropriate resourcing to counter foreign disinformation campaigns. In the spirit of the Active Measures Working Group, an effective counter-disinformation strategy will require a whole of government approach, likely anchored by the State Department and supported by the Pentagon, the intelligence community, and other key stakeholders.

Finally, though the U.S. government can and should do much more to counter disinformation campaigns, it should be clear-eyed about the fact that its ability to shape the information environment has eroded since the 1980s. A comprehensive counter-disinformation strategy would be smart to recognize the limits of government action given the speed and scale with which information moves across social media today.

Thus, it’s important to nest government-led counter-disinformation activities within a broader set of actions driven by the private sector. Playing the role of coordinator, the United States should encourage the creation of a fact-checking clearing house among social media platforms to rapidly counter suspected disinformation. Indeed, Facebook and Twitter have already begun adding fact-checked labels to potentially false or misleading posts—to the ire of Donald Trump. This should be encouraged and expanded to operate at the speed and scale with which content is generated and disseminated across social media.

The government could also use innovative investment pathways such as the Defense Innovation Unit or Joint Artificial Intelligence Center to incubate the development of new AI technologies that media platforms could use to spot deepfake technology —which can be used to create fake videos, new images, and synthetic text—at work. Deepfakes are rapidly becoming an inexpensive, fast, and effective means by which actors can wage irregular warfare against their adversaries.

Regardless of the precise form it takes, the future incarnation of the Active Measures Working Group should seek out Silicon Valley leaders to not only help co-lead the initiative but to also staff other key posts across the executive branch. In the end, the pathway to U.S. preeminence requires mobilizing the country’s unique assets: its ability to innovate, marshal resources at scale, and to come together in times of distress—as after 9/11. Only a response marked by bipartisanship within government—as well as strong partnerships with actors outside of it—can give the United States the reality check it desperately needs.

Brian Raymond is a vice president at Primer.ai. Previously he served on the U.S. National Security Council and with the CIA.

The most dangerous form of cyberwar is the accelerating war to hijack our minds and belief systems

 



Weaponizing the truth

 

Computational warfare and disinformation campaigns will, in 2020, become a more serious threat than physical war, and we will have to rethink the weapons we deploy to fight them.

We often think of cyberwar as hacking things like financial networks, nuclear-power plants and political-campaign emails, but the most dangerous form of cyberwar is the accelerating war to hijack our minds and belief systems. This is an attack on truth – and democratic countries are most at risk.

“Netwar” – information-related conflict at a grand level among nations or societies – took off in the early 1990s. It has been on a low-level simmer ever since, but is likely to boil over in 2020 as the primary – and perhaps preferred – method by which states jockey for power in the global system. In 2013, Russian General Valery Gerasimov wrote, “the role of nonmilitary means of achieving political and strategic goals has grown, and, in many cases, they have exceeded the power of force of weapons in their effectiveness.” Russia’s actions since then have demonstrated its continuing commitment to that doctrine.

Cybersecurity researcher Ben Nimmo describes Russia’s approach in terms of the “4Ds”: dismiss critics, distort facts, distract from other issues, dismay the audiences. And indeed Russia has been leading the way in using disinformation-based warfare against other nations. But others are now joining them.

Last year Iran deployed fake news, fake social-media accounts and bots to spread disinformation about the downing of a US drone and subsequent seizure of a British oil tanker in the Strait of Hormuz. And China has been stoking anti-western sentiment both at home and abroad and creating a wholly different version of reality about the demonstrations in Hong Kong. What Hong Kongers and much of the western world view as a demonstration movement, China is calling “near terrorism”.

In 2020, more countries will discover the power of the 4Ds. Researchers from the Computational Propaganda Research Project at the Oxford Internet Institute have found evidence of organised social-media manipulation campaigns in 48 countries in 2018, up from 28 in 2017. And they have found that political parties and governments have spent more than $500 million on the “research, development, and implementation of psychological operations and public opinion manipulation over social media” since 2010. Based on the undoubted success of these attacks, I predict this figure will increase to over $2bn in 2020.

This type of cyberwarfare requires a new set of defences. Deterrence tools are important but an army of digital human janitors – like those employed by the major social networks to flag images for nudity and hate speech – will be powerless against an increasingly automated set of attacks. In 2020 we will realise that, to fight disinformation, we will need instead a Manhattan Project for truth.

Imagine if one million people and one million artificially intelligent agents were assigned to scale up Wikipedia or a Wikipedia-like knowledge base as part of a national defence effort.

These kinds of knowledge bases already exist inside many countries’ intelligence agencies for national security purposes, but we need a public version that keeps track of history as it unfolds minute-by-minute. This effort would be ultimately about building and enhancing our collective intelligence and establishing a baseline for what’s true or not. Democracy as we know it won’t be possible in a world where information is distrusted and everything is manipulatable.

The asymmetry in this fight is that democracies are more susceptible to manipulation than authoritarian and totalitarian regimes designed to suppress individual freedom of thought and the open flow of information. What’s at stake is democracy itself – and, importantly, a very fine line for democratic governments to walk between censorship and freedom of speech. In 2020, we will begin to weaponise truth.

Authored by Sean Gourley, but originally appeared in WIRED, UK on Jan 06, 2020
Photo credit: Joe Waldron

In early April, Primer’s Senior Director for National Security, Brian Raymond sat down with Cipher Brief COO Brad Christian to discuss how machine learning is impacting national security. What follows is a lightly edited version of the State Secrets podcast.



Cipher Brief Press

In the latest edition of our State Secrets podcast, Cipher Brief COO Brad Christian talks with Brian Raymond, who works for Primer, one of our partners at The Cipher Brief’s Open Source Collection, featured in our M-F daily newsletter.

At Primer, Brian helps lead their national security vertical; in other words, their intelligence and military customers, but also their broader federal practice. That means he’s involved in everything from sales to advising on product development and overseeing current customer engagements.

The Cipher Brief: Let’s dive into these emerging technologies and how they relate to national security. We’re talking today about machine learning and artificial intelligence. These are terms becoming more talked about and prominent in the news. But frankly, most people still don’t have a good grasp of what they mean. Can you give us a high-level perspective on what it means when we hear the terms’ artificial intelligence’ and ‘machine learning’, and why do I need to care about them?

Raymond: That’s a great question, and maybe I’ll just back up a little bit, in terms of what sparked my interest when I joined Primer and why this field is so exciting. Previously I was at the CIA, primarily as a political analyst, as well as having served in several additional roles. From the CIA, I went to the white house. I served as a country director from 2014 to 2015 and was able to see the intelligence collection process in terms of analysis and decision making from many different angles.

Fast forward to 2018 when I had the opportunity to join Primer. I am not a tech expert by any means. I don’t have a background in machine learning or artificial intelligence. My interest sparked when I saw what machine learning and artificial intelligence could do to accelerate the mission.

At the highest level, machine learning is different from general artificial intelligence in that machine learning is leveraging what’s called a neural net. It replicates in some ways the structure of the brain where you have neurons and synopsis to build very complex models. Hundreds of millions of nodes help automate a process that’s typically done by a human today.

And so, let’s unpack a few practical examples. Most listeners on this podcast are probably familiar with object detection or object recognition. We can feed an image into a particular algorithm and determine, okay, is that a dog, or is that a car? Solving this problem has been the focus for at least 15 years. Now the algorithms are getting really, really good, to the point where they’re fed into self-driving cars and weapon systems.

There are many different areas where the technology is beginning to take hold. Instead of having large teams of humans that are clicking dog or car and sorting different imagery, this can now be done in an automated fashion at scale and speed by algorithms.

That’s one example, and it’s something that took hold and became operationalized. It was injected into workflows about six or eight years ago. There’s still much work in progress, but now it’s being commercialized. It’s becoming increasingly mature.

There are other areas of AI that are a lot thornier, and there has been slower progress. One of those areas is the realm of natural language and human spoken language – think Siri on the phone or Alexa. At the highest level, algorithms intended to help accelerate and augment rote tasks that humans are undertaking to free them up to work on higher-level tasks.

The Cipher Brief: These issues you’re talking about are critical. Not just to make life easier or to make things more efficient, but so that America and the military can maintain its innovative and technological edge. It’s seriously challenged for the first time. In terms of machine learning and natural language processing, what are some of the ways that you see this operationalized in the national security space, and what are some of the things that we should be looking for in the next three to five years?

Raymond: That’s a good question. To respond, I’d probably break it down into three key messages. I’ll unpack each one. The first is that there has been much learning that’s occurring over the last several years. It happened when pairing operators and analysts with algorithms to impact mission. It requires a partnership, and in some cases, an entirely different organizational model that exists within the national security organizations today.

A partnership approach is necessary to use and fully leverage these algorithms.

The second, especially in the world of natural language processing but also more broadly, we’ve seen an absolute explosion in the performance of these algorithms over the past 18 months. The algorithm’s performance has mostly gone under the radar in national news and most of the publications that I and others read.

We’re really in a golden age right now, and a lot of new and exciting use cases are unlocking because of these performance gains.

The third thing I’ll talk a little bit about is that the use cases are becoming more crystallized, especially for natural language algorithms. The three categories are natural language processing, natural language understanding, and natural language generation.

We’re teaching algorithms to not only be able to identify people, places, and organizations, but to understand and then also generate new content based on that.

The use cases for that, quite frankly, have been a little mysterious. So, with Primer, we were founded in 2015, before this golden age spawned.

In late 2018 with the release of an algorithm developed by researchers at Google, called BERT, we saw that these algorithms were brittle. They were good at narrow tasks, but it required much training, and there were difficulties when trying to port across to different document types.

With those constraints, it was challenging to find wide channels to play in and to add value for the end-users. Today three use cases transcend all of our national security customers where we’re finding that the algorithms are fantastic and augmenting what humans are already doing.

I’ll just call the first one’ finding needles in a haystack.’

Suppose you are concerned with supply chains, and you have 5,000 suppliers that you care about for some type of complex system that you’re building. These suppliers are distributed globally, and you’re concerned about disruptions to the supply chains or malicious acts, for example.

But how do you monitor news or bad things happening for 5,000 companies? That’s a lot of Google news alerts, for example.

We’re able to train algorithms that continuously scan hundreds of thousands or millions of documents. They can look for instances in which a small supplier may have been subject to a cybersecurity attack or their headquarters burned down, or their CEO is caught in a scandal and immediately cluster articles or reports around that and then surface those for review.

Finding a needle in a haystack problem is being done by large groups of people. They’re not even able to wrap their arms around all of the consumable information.

The second use case is ‘compression and summarization.’

Recently a study looked at analysts that covered mid-tier type countries. Not much is written about countries like Paraguay, for example. In the mid-nineties, you may have had to read about 20,000 words per day to stay up on what’s going on with that particular country.

Fast forward to 2016, so four years ago, and you had to read around 200,000 words per day to stay abreast of developments. And the forecast was that between 2016 and 2025, it was going to increase tenfold, so from 200,000 to 2,000,000 words per day.

Whether you’re covering a country or a particular organization or a company issue, there’s a tremendous amount of available information that is required to stay ahead of developments.

Information is growing at a logarithmic pace, and so you can’t hire your way out of the problem. You need to find ways to compress and summarize all that information, and you do that by pairing analysts or operators with algorithms.

Compression and summarization is the second key area that users and organizations are finding tremendous benefits.

The last use case is what we call ‘breaking the left-screen, right-screen workflow.’

Left-screen right-screen is a broad workflow that existed since the dawn of modern intelligence analysis in World War Two. Analysts read reports that are coming in on their left screen. Then they take insights from those reports or details that are relevant or that they care about and curate them into some type of knowledge graph on the right screen. Today the analyst turns it into an Excel spreadsheet or a Wiki, emails, Word document, or it might become a final report.

We’re getting really, really good as a machine learning community, at automating that jump.

Machine learning can find all the people in these 10,000 documents and then find all the details about these people and then determine how all these people link to one another. It can continuously create new profiles for people that are mentioned, including those who are just popping up, and then show what further information has been discovered.

Natural language processing has the potential to unlock hundreds of thousands of hours of manual curation still done in 2020. We’re finally at a point with the performance of the algorithms where we can begin automating a lot of that work and freeing people up to do what they’re best at, which is being curious, pursuing hunches, and thinking about second or third-order analysis. And so that’s what’s really exciting about where we’re at today.

The Cipher Brief: What do you see in terms of acceptance of these new approaches amongst these organizations? Because we’re still in a time where there’s a disparity amongst skill sets and knowledge and understanding as it relates to not just advanced technology but basic technology in many organizations.

What you’re talking about now is bleeding the line between some of the most advanced technology that’s out there, working with people who may not understand it, or may not be open or accepting of it.

What do you see in terms of how this is being accepted and practically used in organizations where it may come into contact with someone who’s not from a tech background and has to learn how to work with this new technology and trust it, most importantly.

Raymond: That’s a great question. It brought to mind something that Eric Schmidt said a couple of years ago, which was, “The DoD doesn’t have an innovation problem. It has an innovation adoption problem.”

I think there’s much truth to that, but since Mr. Schmidt made the quote, there’ve been some incredibly exciting developments across the IC and DoD. We’ve benefited tremendously through our partnership with In-Q-Tel, which you know originally was the venture capital arm of the CIA and represented the IC and DoD. Through really innovative programs like AFWERX, the Air Force has rapidly identified and integrated technology into its mission.

There’s also work going on with the DIU and the Joint Artificial Intelligence Center. Additional work is going on with the Under Secretary of Defense for Intelligence. We’re witnessing this explosion of activity throughout the space and creating novel and exciting contracting pathways, with vast amounts of money invested in artificial intelligence.

There’s a new sense of urgency that you didn’t see before. Secretary Esper has been continually saying that artificial intelligence is one of the most, if not the top priority for the Department of Defense.

And we’ve seen that reflected in the spending budget this year. We also see it in the posture of organizations that are recognizing the need to innovate. We see prioritization and pathways and funding.

The challenge with all of this is that although algorithms and learning solutions reside in the realm of SaaS solutions, they’re fundamentally different from buying the Microsoft Office Suite and getting it loaded onto your computer and then using it.

I’d like to share an anecdote. At Primer, we have an office in DC, but we also have our primary offices in San Francisco’s financial district. Every single day we look out the window and see dozens if not hundreds of self-driving cars pass by our building. Different types of electronics are mounted to the rooftop. But you also see people behind the wheel which means some type of training is going on.

Tesla and almost all the major auto manufacturers have “self-driving” cars, but it’s under limited circumstances, in specific conditions. The reality is that there is just an enormous amount of training that is required still in that realm of self-driving vehicles to make it a consumer product.

Within that context, when the IC and DoD deploy object and natural language algorithms to the hardest of the hard problems they grapple with, they have to do it at speed. Organizations fundamentally need to be reconfigured in some cases to make maximum use of machine learning solutions

What that means is the need to train often in the expertise needed to train the models. Models are only as good as the training and the training data that they receive. Usually the expertise resides in classified networks and in the heads of officers, analysts, and operators engaged in this every day.

It requires continuous engagement because serious questions arise when training models to perform specific tasks.

  1. Who in the organization owns the model?
  2. Who is in charge of updating the model and reviewing training data that’s produced across the organization?
  3. Who integrates it?
  4. How should the organization think about it?

And then finally, where do we really want to go in terms of what tasks are going to get us the most bang for the buck early on?

Almost 80% or 85% of commercial AI innovation initiatives have not delivered what the folks initially thought they would. We believe that that number will come down as average performance increases and as there is learning that occurs both on the customer side and on the company side. However, these are still relatively early days, and these are complicated technologies to leverage effectively. It requires a tight partnership from the top down in these organizations to make it a success.

The Cipher Brief: What’s your estimate on when we’ll see this, adopted, and we’re comfortable with it and its part of our everyday life in the national security community?

Raymond: That’s a great question. I think soon. There’s one thing we haven’t talked about. Cloud infrastructure is being put in place that will unlock a lot of opportunities.

The JEDI contract with DoD has been in the headlines recently. Having a common cloud infrastructure entails computing power, and the ability to move data across enclaves easily is essential for operationalizing machine learning solutions at scale.

Getting that foundation in place will unleash much innovation in different areas.

Coming back to the topic of performance gains, for example, and the task of finding people or locations or organizations in documents. If you hire and train a group of people and have each go through a thousand documents and then ask them to find all the people, places, and organizations across all those documents, usually around 95% precision is typical for humans.

Humans will miss some or may not realize that two different people’s names are spelled differently in the document, for instance. We’re at the point where we’re approaching 96%-97% precision for a number of these tasks, and that’s just in the past six months.

These gains where we’re at or above human-level performance on specific tasks, will gain traction quite quickly.

Finally, the workflows for how we integrate this at scale, for these organizations have started to clarify as well. And that’s this concept of, we call it CBAR, but you’ve got to connect to the data sources first. This common cloud infrastructure is going to unlock more opportunities. We see a lot of really cool and exciting innovation going on there on the connecting side.

You have to connect and then build the models – lightweight, straightforward user interfaces for training the models exist and are in use today.

You have to unleash these models on the data, analyze it, and then inject it into workflows. That’s stylizing as well, and then you can feed the insights back out into whatever products or systems the end-user cares about through APIs or various reporting mechanisms.

A couple of years ago, the connect, build, analyze, report, flow, was not clear. It needed to be architected. It’s now going to be the standard. And, with the infrastructure coming into place and the learning that is occurring with these organizations, I think we’re going to witness a virtuous cycle for innovation.

The Cipher Brief: Any final thoughts? If you had to give one takeaway for our folks, these organizations, the national security community that you’re talking about just from this conversation, what would it be?

Raymond: The overarching message that I would communicate is commitment. These initiatives, whether or not it’s in the natural language domain where Primer is playing or other machine learning domains, they require an investment of an organization and a commitment to make use of it.

And it’s just a fundamentally different problem set than many other, technological solutions, whether or not they’re hardware or software solutions that we’ve seen over the past couple of decades. And that’s a risky endeavor.

But, as you mentioned earlier, our competition, the Chinese, the Russians, the Iranians, and others are making significant investments in these areas. They are vertically integrating, and we’ve always taken a different approach than that. And what we see here is just incredible innovation coming out of the technology sector.

Many companies are incredibly eager to do business with the IC and DOD and contribute to their missions. The level of technological maturity is reaching a point where numerous opportunities have been unlocked that didn’t exist for them even in 2019 or 2018.

Read more expert-driven national security insights, analysis and opinion in The Cipher Brief.

THE AUTHOR IS BRIAN RAYMOND
Brian joined Primer after having worked in investment banking and today helps lead Primer’s National Security Group. While in government, Brian served as the Iraq Country Directory on the National Security Council. There he worked as an advisor to the President, Vice President, and National Security Advisor on foreign policy issues regarding Iraq, ISIS, and the Middle East. Brian also worked as an intelligence officer at the Central Intelligence Agency. During his tenure at CIA, he drafted assessments for the President and other senior US officials and served as a daily intelligence briefer for the White House and State Department. He also completed two war zone tours in support of counterterrorism operations in the field. Brian earned his MBA from Dartmouth’s Tuck School of Business and his MA/BA in Political Science from the University of California, Davis.

Learn more about The Cipher Brief’s Network.

Every morning, analysts, operators, and policymakers arrive at their desks to read the latest news and intelligence reporting that has come in during the past day. The daily rhythm of “reading the morning traffic” has remained largely unchanged since the 1950s. Professionals in the intelligence community and military spend upwards of a third of their days poring over incoming cables to stitch together a coherent picture of worldwide events. Policymakers and senior military commanders increasingly struggle to consume the enormous volume of daily reporting and rely on analysts and operators to deliver tailored intelligence briefings to help them keep up.

The 2017 London Bridge attack served as a reminder that national security professionals both in the U.S. and in partner countries are facing an information overload in the context of modern intelligence collection. In the 48 hours following the attacks, intelligence analysts were confronted with more than 6,600 news articles on the attack, as well as tens of thousands of YouTube videos, tweets, and other social media postings. “The good news is we’ve got lots of information, but the bad news is we’ve got lots of information,” said Philip Davies, director of the Brunel Centre for Intelligence and Security Studies, in the wake of the attacks. “I think we’re going to have to be realistic that MI5 and SIS (MI6) are being confronted with information overload in terms of scale and complexity. We have been cutting national analytical capability for 20 years. The collection of information has increased but if you cut back on analysis you get overload.”

Advances in machine learning and computer processing in recent years have triggered a wave of new technologies that aid in processing and interpreting images, video, speech, and text. To date, the technological developments in the realms of computer vision and voice recognition have been most impactful, finding wide-ranging applications from helping drivers avoid accidents to powering digital assistants to recognition of key objects and people for reconnaissance purposes. However, advances in neural networks pioneered in the image processing domain have helped make possible new AI technology capable of comprehending the unstructured language common in everyday life. This technology, known as Natural Language Processing (NLP), has reached a level of maturity that it is now finding far ranging applications across the national security community.

NLP broadly refers to the set of technologies that enable computers to not only understand, but also generate language in human-readable format. Driven by multi-layer neural networks, machine learning algorithms are now capable of a range of functions long the exclusive domain of humans, including drafting newspaper articles, summarizing large bodies of text data (e.g. military after-action reports), and identifying a wide range of entities, including people, places, events, and organizations. Moreover, NLP can now understand the relationship among these entities, rapidly extracting and collating key information from thousands of documents such as the number of casualties from a bombing, the political affiliation of an organization, or the type of illness afflicting a political leader. Perhaps most impactful, not only can these algorithms detect when a person is mentioned, but they can then gather all of the relevant information about them across a large set of reporting, thus creating profiles on the fly. These capabilities are changing the paradigm for how national security professionals not only manage daily traffic, but also information flows in times of crisis.

NLP to accelerate work of operators in the field

NLP is poised to deliver enormous time savings for media exploitation efforts by the special forces and intelligence communities. Over the past two decades, these organizations have perfected the exploitation phase of the targeting cycle, but few technologies have emerged that accelerate the analysis and dissemination phases.

This challenge is becoming more urgent as the amount of video, audio, and textual data being pulled off the battlefield skyrockets. General Raymond Thomas, Commander of U.S. Special Operations Command (SOCOM), disclosed last year at the GEOINT Symposium that SOCOM collected 127 terabytes of data from captured enemy material alone in 2017, not including live video from drones. “Every year that increases,” he said. By comparison, the Osama Bin Laden raid in 2011 resulted in 2.7 terabytes of data.”

The emerging ability to train new NLP algorithms on the fly is beginning to make the leap from the private to public sector and is just now beginning to find uses among national security professionals. These technologies will enable these organizations to rapidly sift through staggeringly-large caches of digital media to unearth files most likely to be of intelligence value.

Introduction of these technologies will upend the longstanding approach of manual review by large teams of analysts and operators by enabling this work to be pushed down to small tactical teams in the field, dramatically accelerating their targeting cycles.

Inflection point with NLP for intelligence analysts

NLP technologies also will dramatically augment analysts’ ability to grapple with large volumes of news and intelligence reporting. This issue is becoming more acute as intelligence community (IC) leaders race to establish new IT platforms to facilitate rapid information sharing and collaboration. The establishment of Amazon’s C2S cloud environment for the CIA, the DNI’s push for the IC IT Enterprise, and the Pentagon’s looming $10B Joint Enterprise Defense Infrastructure (JEDI) cloud program are indicative of the broader trend toward deeper integrationacross the 17 IC agencies and the growing recognition that IC’s future success is in large part tied to its ability to capitalize on emerging AI technologies. While fewer stovepipes will help ensure the right intelligence makes its way to the right analyst, operator, or policymaker at the right time, the additional intelligence will exacerbate the information overload these groups already grapple with.

NLP technologies are also eroding the tradeoff analysts historically have had to make between making timely judgments and judgments based on a comprehensive analysis of available intelligence. These technologies are enabling analysts to read-in each morning in a fraction of thetime, and interact with all of the reports hitting their inboxes each day, not just those flagged as highest priority or from the most prominent press outlets. The effect of these algorithms goes beyond accelerating the speed and scale that individual analysts can operate, to also mitigating hitherto unavoidable analytic biases associated with source bias. This is lowering the cost analysts face for pursuing hunches, exploring new angles to vexing issues, and creating time for them to learn about new issues.

For example, it’s now much less costly for an analyst covering Syria to leverage NLP to read-in on how the Russian press is covering the fighting in that country, from documents originally written in Russian. These same, NLP technologies also are enabling policymakers and commanders to engage more immediately with reports in real time, through automatic summarization and the ability to surface key facts in near real time.

Self-updating knowledge bases will mark historic paradigm shift for analysts

Perhaps most transformative, emerging NLP technologies are showing promise in powering auto-generating and auto-updating knowledge bases (KBs). Although still in their infancy, these self-generating “wikipedias” likely will have the most dramatic impact by eliminating potentially millions of worker-hours of labor manually curating KBs such as spreadsheets, link charts, leadership profiles, and order-of-battle databases. These next-generation KBs will continuously analyze every new piece of intelligence reporting to automatically collate key facts about people, places, and organizations in easily discoverable and editable wiki-style pages. The introduction of this technology will disrupt the daily rhythm of tens of thousands of analysts and operators that spend a significant amount of time each day cataloging facts from intelligence reporting.

Self-generating and updating KBs will make it easy for intelligence and military organizations to answer the frequently asked but very difficult to answer question: tell me everything we know about person (or place or organization) X? Still today, answering this question is an enormously costly and time-intensive process fraught with pitfalls. Inexperience, imperfect organization, or lack of human resources make it impossible for organizations to fully leverage all of the data they already receive.

In the not-too-distant future, analysts likely will arrive each morning to review profiles created overnight by NLP engines. Rather than spending hours cataloging information in intelligence reports, they’ll simply review additions or edits automatically made to “living” databases, freeing them to work on higher-level questions—and minimize the risk of overlooking key intelligence.

See this article at The Cipher Brief

A company’s data is often its most valuable asset. The modern enterprise relies on its data for performance monitoring, market insights, and for maintaining competitive advantages and institutional knowledge. However, with the explosive growth of data and data types, companies are faced with the increased challenge of organizing, managing, and gaining insights from their data. AI-based solutions used to address these challenges are on the forefront of machine learning, infrastructure design and deployments, and flexible computing.

At Primer, we build machine learning systems that can read (NLP) and understand (NLU) diverse sets of unstructured text data, and provide users with a suite of summarized outputs (NLG) — from PDFs to knowledge bases. Our goal is to help companies accelerate their understanding of the world around them — from academic researchers, to financial analysts, to public sector employees.

In many of our deployments, our customers bring their data to our platform and we deploy our suite of AI engines on their infrastructure. This allows companies to keep their proprietary data secure within the confines of their own network. However, it introduces a unique set of challenges that we have to address. To frame the problem in common language: The objective is to extract meaning, better organize, and summarize terabytes to petabytes of unstructured data across a variety of cloud and on-premise deployments with cutting-edge and continually evolving AI-engines. Questions we consider include:

  • How do we maintain product continuity across different infrastructures?
  • What if the customer is running on an architecture we’ve never worked with before?
  • Will our customers have enough computing resources on hand?
  • What will it cost to support our customers within our hosted solution?

At the risk of reductionism, there are two broad strategies to follow. The first is to make the system requirements and installation process as generic as possible, making few assumptions about the customer’s infrastructure. And the second is to adopt bespoke “platforms” that are supported by the product and leverage what’s known about those platforms to automate the process.

Each strategic approach comes with trade-offs. The first is less work for your engineers but potentially brings friction for the customer and pre-sales teams. The second can yield a positive customer experience but can be costly to maintain—and when implemented poorly, the customer may have a negative experience in spite of the extra effort. The gravity of the trade-offs is compounded when the company is in its early stages and every engineer needs to be tied to critical functionally that can scale the business.

At Primer, we chose a hybrid of the two strategies and have invested in strategies and technologies to minimize our time to deploy and maintain solutions while also minimizing the amount of engineering cycles required to do so. We achieve this by leveraging Kubernetes as an orchestration technology. Kubernetes allows us to build for one notion of infrastructure instead of a bespoke set of approaches tied to specific configurations and operating systems. Traditionally when designing software, a considerable amount of debate occurs around how using AWS, GCP or perhaps Azure will change software design. Each of these cloud providers has non-trivial differences that can make an application dependent on that provider, especially if you take advantage of the more useful features of the provider. Instead of mitigating this by leveraging less of the cloud provider’s offerings, our approach allows our engineering teams to make more assumptions about the characteristics of the infrastructure. The big win for Primer is that Kubernetes does much of the heavy lifting of translating its design primitives to different infrastructure types. Ideally, this brings the best of both worlds: the ability to build software for a highly automated deployment/installation while still having the flexibility to adapt to almost any infrastructure our customers may use.

This idea is far from novel, and can be found in an increasing number of companies across a variety of verticals. We merely created an abstraction called an “interface” that allows us to handle many similar but different implementations of the same functionality. The novelty worth mentioning is that the hybrid approach both informs and enables great flexibility in future product design and iterative product improvements, whereas historic infrastructure considerations can come at the very end of the development cycle or constrain aspects of the final solution.

This freedom has enabled us at Primer to focus our precious engineering talent on more significant and more distinguishing matters. We can focus on rapidly tuning our engines with the latest algorithmic approaches, and we can now distance ourselves from the concerns of “what operating system” and “what cloud provider”.

 



building-software-for-any-infrastructure

 

As if the above benefits were not enough, using a solution like Kubernetes also enables Primer to take advantage of cost savings measures like AWS Spot Instances. Using Spot has traditionally been difficult because of the way we thought about infrastructure. Servers are carefully crafted to serve a particular purpose, contain non-redundant state, and take human time and energy to replace. These factors make behaviors like that of the Spot market particularly disruptive to an environment. On the contrary, Kubernetes is built around a notion of servers being ephemeral by nature and not containing any state that can’t be lost. This philosophical difference commonly referred to as “pets” vs. “cattle”, actually negates most of the “penalties” of the Spot market.

At Primer, we use a service called Spotinst that has already taken a lot of the complexity out of leveraging Spot instances and has a native connector for Kubernetes. With Spotinst we simply plug our Kubernetes cluster into it and suddenly save 60-70% of our AWS spend alongside with Pod-driven infrastructure autoscaling, meaning that we don’t need to fit our instances to our containers and Pod sizes. The beauty of this solution is that we achieved these savings without investing any additional engineering effort. This flips the traditional cost/benefit model on its head when looking at the cost saving for infrastructure. Spotinst has been an invaluable partner for making our infrastructure more useful and cost-effective.

Spotinst has also provided additional tooling and automation, like a dashboard to view our cluster’s pod allocation across the cluster. We can see what instance types are being most effectively used, and make more informed decisions about our scaling and capacity needs.

Finally, Spotinst will provide our customers with a solution to cost effectively scale up their own computing environments as they use heavy compute AI-based solutions like Primer to drive new insights into their organization.

We believe that companies will have a growing need for services like Primer to provide insights into their ever-growing volumes of internal and external data. While a number of AI-based solutions will be developed, we believe that these solution providers will not have the luxury to fall back on traditional SaaS models, and that architectures like the one we propose will provide flexibility for both developers, support the deployment and maintenance within customers IT organizations, and increase the time to value for the end users.