A New State of the Art for Named Entity Recognition

By Primer Engineering Posted November 8, 2019

Here at Primer we are building machines that can read, write, and understand natural language text. We measure our progress by breaking that down into smaller cognitive tasks. Usually our progress is incremental. But sometimes we make a giant leap forward.

On the reading task of Named Entity Recognition (NER) we have now surpassed the best-performing models in the industry by a wide margin: with our model achieving a 95.6% F1 accuracy score on CoNLL. This puts us more than two points ahead of a recently published NER model from Facebook AI Research. More importantly, we are now on par with human-level performance. It requires consensus across a team of trained human labelers to reach higher accuracy.

NER performance

Primer’s NER model has surpassed the previous state of the art models of Google and Facebook on F1 accuracy score. Graph adapted from Sebastian Ruder, DeepMind.

NER: What’s in a Name?

Named Entity Recognition (NER) is a foundational task in Natural Language Processing because so many downstream tasks depend on it. The goal of NER is to find all of the named people, places, and things within a text document and correctly classify them.

The gold standard benchmark for NER was laid out in a 2003 academic challenge called CoNLL. The CoNLL data set consists of news articles with all of the named entities hand-labeled by humans. (There is also a German-language CoNLL data set.) This established the four standard NER classes: person (PER), organization (ORG), location (LOC), and miscellaneous (MISC).

The NER labeling task is not as easy as it sounds. Consider this sentence:

I_heard_that_Paris_0

After a thoughtful pause, a human reader can deduce that “Paris Hilton” is a person, “the Hilton” is an organization, and “Paris” is a location. (Humans will disagree about 15% of the time whether “the Hilton” should instead be classified as a location.)

A popular industry solution for extracting named entities from text is spaCy. Here is the output of spaCy 2.1 NER:

I_heard_that_Paris 1

Not bad. The spaCy model does correctly identify all of the named entity spans. And it correctly identifies the second “Hilton” and second “Paris” as an organization and location, respectively. But Paris Hilton herself is misclassified as an ORG. So spaCy is only getting 66% accuracy on this text. And on our diverse gold-labeled NER data spaCy 2.1 falls well below 50% accuracy.

In order for models to be useful in a commercial setting, they need far better performance. So some new ideas are needed here.

New models are good, but data diversity is king

To create our own NER model we started with a BERT-based architecture and fine-tuned it for NER with the CoNLL training data. By switching to a universal language model like BERT, we immediately left spaCy in the dust, jumping an average 28 points of precision across all entity classes.

However, that higher precision came at a cost in recall. For example, Primer’s BERT-NER model was not confident enough to tag “Paris Hilton” in this sentence:

I_heard_that_Paris_2

Pushing our NER model beyond state of the art required two more innovations. First, we switched to a more powerful universal language model: XLNet. But we discovered that even larger performance gains are possible through data engineering.

The CoNLL NER data set is limited to just one type of text document: Reuters news articles published in 1996 and 1997. This is very low data diversity compared to the internet-scale corpus of documents we process at Primer. We needed our NER model to be trained on a far broader range of writing styles, subject matter, and entities. So we curated a highly diverse group of gold-labeled documents, including entities from the financial, defense-related, and scientific worlds.

Injecting this diversity into our training data made all the difference. Even adversarial examples rarely stump Primer’s NER model:

I_heard_that_Paris_3

Since the first universal language models like BERT came out one year ago, we’ve seen a revolution in the field of natural language processing. You can see this rapid progression in the graph above. But take note where Primer’s NER model lands. Our performance on CoNLL stands above the best results published by the enormous research teams at Google, Facebook, and the entire academic community. We have made more progress on NER over the past two months than the entire machine learning field has achieved in the past two years.

Named Entity Recognition (NER)

Primer’s NER model is approaching human-level performance. We find that individual humans disagree on consensus NER labels 15% of the time on average, even after training.

Of the four Entity groups, PERSON extraction has the highest performance with 0.94 precision and 0.95 recall. Location extraction is the second highest with organization third and Miscellaneous the fourth highest ranking. These results mirror the performance of our human evaluators against gold standard data, with humans having the lowest inter-annotator agreement on the miscellaneous and organization categories.

Putting NER to work

So what can you do with the world’s best NER model?

Primer powers analyst workflows in some of the largest organizations in the world. Better NER translates to better downstream natural language processing. It powers coreference resolution to correctly attribute every quote by every politician and CEO in the world. You need it for relation extraction to convert unstructured text that describes entities into structured data—facts about people, places, and organizations. And for text classification, for example identifying corporate risks hidden deep inside a company’s financial documents.

To see how NER works on a text document, consider this transcript of Mark Zuckerberg’s congressional testimony. It takes about 5 seconds to process the 50,000+ words with Primer’s NER model and extract 271 people, places, organizations, and even named entities such as Facebook’s Libra cryptocurrency project. (See the output below.)

For a deeper stress test we’ve been running it on document types that it has never seen before. For an extreme test we turned to Harry Potter fan fiction novels. Because if our model deduces that the Noble House of Potter is an organization, Phobos Malfoy is a person, Libere Loqui is a miscellaneous entity, and Snog Row is a location, then extracting the named entities from business documents should be a walk in the park.

So how does it do? We’re glad you asked. We ran the experiment and here are the results. Below is the output from Zuckerberg’s congressional grilling.

NER output from Mark Zuckerberg’s congressional testimony, 23 October 2019

People


Cheryl	Tipton	Perlmutter
Vargus	Congressman	Casten
Martin	Clay	Mister
Mister Meeks	Cleaver	Barr
Hawley	Phillips	Inaudible
Warner	Davidson	Chan Zuckerberg
Vargas	Gonzalez	Mark Zuckerberg
Beatty	Riggleman	Bud
Chair Waters	Pressley	Axne
David	Zuckerman
Luetkemeyer	Huizenga
Mr Zuckerberg	Kostoff
Emmer	Speaker
Scott	Foster
Hollingsworth	Ms. Waters
Gonzales	Presley
Himes	Gottheimer
Green	Stivers
Louis Brandeis	Loudermilk

Organizations


FinCEN	EU	Hamas
Senate	OCC	LGBTQ
Liberty	European	NCMEC
law	DNC	FDIC
AML/BSA	FHA	NFHA
fed	CFT	CFPB
HUD	Facebook	ACLU
PayPal	congress	BFSOC
SEC	Newsweek	Visa
Alipay	Congress	Dais
FTC	Calibra	Rand
FBI	Twitter	Uber
Stripe	NASDAQ	FINMA
YouTube	DOD	VISA
AML	BSA	Crypto
Google	Lyft	ICE
Subcommittee on Oversight and Investigations	eBay	Nazi
WhatsApp	MDIs	CFTC
Collibra	First	FHFA
UN	Super	LLCs
FINCEN	Black Lives Matter	Pew Research Center
Dias	The New York Times	Bookings Holdings
House	Fintech Task Force	The Washington Post
Ebay	Federal Reserve	Financial Stability Board
NAACP	US Treasury	The Department of Justice
US Department of Housing and Urban Development	Facebook Libra	Cambridge Analytica
G7	Black Mirror	Financial Services Committee
MasterCard	Congresswoman	Hezbollah
Anchorage Trust	AI Task Force	Senate Intelligence Committee
Trump Hotel	Capitol Hill	Independent Libra Association
Muslim Advocates	Mercado Pago	National Fair Housing Alliance
The Capitol	Social Network	Office of Secretary of Defense
The Daily Caller	Trump Hotels	Microsoft
regulators	The Guardian	National Center on Missing and
LIBOR Association	New York Times	Securities Exchange Commission
Libra Association	Wells Fargo	Committee on Financial Services
Libra association	Congressmen	Independent Fact-Checking Network
Labor Association	US Congress	International Fact-Checking Network
Georgetown	European Union	Instagram
Congressional	United Nations	Supreme Court
Trump International Hotel	Department of justice	Messenger
Federal Reserve Board	Federal Housing Agency	The Times
WeChat Pay	Rainbow Push coalition	Independent Association
Georgetown University	Department of Justice	Federal Trade Commission
Housing Rights Initiative	terrorists

Locations


California	Americas	Venezuela
Iowa	Michigan	Asia
Pacific	Arkansas	North Korea
Washington	U.S.	North America
Myanmar	Oklahoma	Switzerland
Utah	America	Georgia
Indiana	New Jersey	Guam
Minnesota	Alaska	Germany
Washington DC	Pennsylvania	Florida
Illinois	Africa	Canada
U S	Silicon Valley	Cyprus
New York	Washington, DC	Russia
DC	Texas	Iran
US	Christchurch	France
Colorado	Syria	Ohio
Virginia	Connecticut	New Zealand
Tennessee	South Dakota	North Carolina
Missouri	Massachusetts	Turkey
United States of America	China	Europe
Kentucky	United States	Maryland
District	Qatar

Miscellaneous


The President	Colibra	XRP
American Nazi	Zuck Buck	African Americans
AI	Americans	Indian Muslims
Stump	Russian	Anti
Sarbanes-Oxley	American	Dune
Libra	Chinese	Libra Project
Green New Deal	Patriot Act	Iranian
Libra White Paper	Republicans	Future
Russians	Venezuelan	Democrats
Democratic	Hispanics

FEATURED EVENTS

FEATURED RESOURCE

Mastering Search: The Semantic Edge

Former DIA Director joins Primer as Advisor Read the announcement

A New State of the Art for Named Entity Recognition

NER: What’s in a Name?

New models are good, but data diversity is king

Putting NER to work

NER output from Mark Zuckerberg’s congressional testimony, 23 October 2019

People

Organizations

Locations

Miscellaneous

RECOMMENDED READING

FEATURED EVENTS

FEATURED RESOURCE

Related Content

Products

Resources

Company

Follow

Get in Touch