Maverick, Goose begin romantic relationship

Welcome to the age of machine-generated headlines

  • How to write news headlines
  • Can we write document summaries?
  • The simplest summarization model

That is a machine-generated summary of this article. I fed the text through a deep learning model that we’re training here at Primer where we’re building machines that read and write. The three-bullet point summary above took less than a second running on a single 2080Ti GPU.

Now you don’t have to read it. You’re welcome! (To understand the machine-generated title, you’ll have to read to the end.)

The base model that makes this possible is laid out in our recently published paper. We fine-tuned the BERT universal language model on millions of news headlines. The model learned to generate titles for documents as an iterative question-answer task. It takes about 1.5 million news articles for the model to learn how to generate headlines as good or better than the originals in the majority of cases.

Question: Can a neural network write headlines as good as those written by humans?

Answer: Yes, especially when the headlines are clickbait.

Generated Titles



Quality Percentages


Headlines written by humans are judged slightly better than machine-written headlines. But overall, a machine can now write headlines as good as a human most of the time.

Journalists have conflicting motivations when writing headlines. The main task is to inform the reader, giving her a short and meaningful description of what the document contains. But much of the success of online news relies on clicks. Headlines must entice people to read stories that display the ads that pay the bills. Luckily, clickbait headlines are a very small portion of Primer’s news corpus—far less than 1% of our hundreds of millions of English-language news articles. The model learned its writing style from good journalism.

This is what it looks like when a machine re-writes clickbait headlines:

human The Obamas Are Getting a Netflix Show, & You Won’t Believe Who It’s For
machine The Obamas are getting a show for kids

human You’ll never guess which U.S. counties grew the fastest
machine McKenzie County in North Dakota had highest population growth in entire US

human The Title Of This Court Filing Will Amaze You!
machine BuzzFeed sued for defamation of Russian hacking defendants

Here is a more representative sample of news articles. In the majority of cases, the machine-generated headlines are as good or better than the originals, as scored by humans in double-blind trials:

human Bad Movie Diaries: A Christmas Prince: The Royal Wedding (2018)
machine Jim Vorel and Kenneth Lowe discuss A Christmas Prince and its sequel, The Royal Wedding

human ABC chairman job: Academic, legal and media executives make the shortlist
machine Former Allens partner Michael Rose among candidates for ABC chairman

human ‘It’s been a long 2 years’ for Kelly, Kudlow says
machine John Kelly replacement to be announced in next few days

human Decision in March on 48-team 2022 World Cup, says Infantino
machine FIFA to decide on 2022 World Cup in March

human Khal Drogo Is Alive—Sort of—on This Week’s ‘Saturday Night Live’
machine Khal Drogo is back on SNL

Ideally, a document’s title should serve as an extremely short but informative summary of its content. Our title generation model is lazy, reading the text of a document only until it finds what seems to be sufficient information to generate that one-line summary.

When the model falls short, it is often because it is confused by longform journalism which is rare in its training data. It focuses on information near the top of the story, simply because that is where most news articles get to the point. Longform journalism has a different structure:

human I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here’s How.
machine People lost weight faster if they ate a chocolate bar every day

human Who’s downloading pirated papers? Everyone
machine How to read the scientific papers

human Gamers unravel the secret life of protein
machine Protein chemistry: David Baker

We could improve our generated titles with rules. But the best way forward is to build a better model by training it on more abundant, higher-quality data. Luckily we have several orders of magnitude more news articles on tap.

Question: Can a title-generating neural network be repurposed to write full document summaries?

Answer: Surprisingly, yes.

For the past few weeks we have been playing with a simplified model. We feed a document into the title generation model and track how deeply it reaches into the text to write a title. Then we feed in the text below that point and generate another. Each “title” becomes the next line in a bullet-point summary. When it reaches the end of the text, the summary is done.

In spite of being completely unsupervised and requiring no additional training, this naive summarization algorithm performs reasonably well right out of the gate.

Here are some human vs. machine-generated summaries, drawing from the CNN/Daily Mail data set:

Police: Teen wanted for killing at Seattle bar hosting Seahawks, fans

  • Suspect is 19, police say
  • CNN affiliate KING reports that several NFL players were in the bar at the time
  • Reality star tweets that she saw body on the floor and a lot of blood
  • Teen shot, killed in Seattle-area bar where Seahawks players and other celebrities were partying
  • Bellevue Police still searching for suspect in murder of ‘Tuba Man’
  • Seahawks ‘ Greg Scruggs tweets The death of the popular musician who played his tuba
  • I saw the guy laying on the floor and blood everywhere

Man, 37, arrested in probe of ricin-laced letters found in Washington state

  • NEW: Matthew Buquet will stay in custody until at least next Tuesday, FBI says
  • The 37-year-old threatened to “injure and kill” a federal judge, an indictment alleges
  • Tests showed ricin was found with the threatening letter, the FBI says
  • He was arrested Wednesday afternoon in Washington state
  • Man arrested in ricin (CNN) threat investigation
  • Matthew Ryan Buquet arrested by FBI
  • Ricin suspected in letters sent to Spokane Post Office
  • Ricin is no known antidote

Deep-voiced bosses bring in the big bucks, says study

  • A new study shows that the lower a CEO’s voice, the larger his company and paychecks tend to be
  • It is still unclear why lower-voiced CEOs are more successful
  • Image consultants say executives can learn to alter voice pitch
  • Leaders can also convey power by how they dress
  • Study shows that CEOs tend to be taller
  • Why labor market is matching real CEOs
  • Voice pitch could also be related to other physical characteristics that suggest dominance
  • Mayew: ‘ There are certain bounds that we’re just born with ‘
  • The power of words
  • How to dress in the workplace, executives should dress in one color, which adds height, and wear plain fabric rather than patterned styles

Remarkably, the model can summarize non-news documents, even though it’s never seen examples. We need this capability at Primer because our customers have their own documents they need to rapidly skim—financial analyst reports, scientific papers, intelligence briefings—before choosing the ones to fully read.

Wikipedia movie plot overviews are a good open-source proxy. This is how our prototype model performs:

The Dark Knight

  • Joker and Harvey Dent form alliance for Gotham organized crime
  • The Joker offers to kill Dent for half of their money
  • Joker kills Gambol and taking over his gang
  • Batman brings the Joker back to Gotham
  • The Joker threatens to keep killing people unless Batman reveals identity
  • Dent kidnaps Joker
  • Joker attacks convoy
  • Batman apprehends Joker who killed Rachel and Dent

Top Gun

  • Maverick and Cougar fly F-14A Tomcat on USS Enterprise
  • Maverick shepherds Cougar back to the carrier
  • Maverick and Topgun
  • Maverick defeats Topgun instructor During first training sortie
  • Maverick becomes rival to Iceman
  • Maverick, Goose begin romantic relationship
  • Maverick considers retiring
  • Iceman seeks advice from Viper, who reveals that he served with Maverick’s father Duke Mitchell on the USS Oriskany

The errors that the model makes involve entities. It correctly identifies key events that should be included in the summary, but it incorrectly guesses who is involved. What’s needed is a fact-aware text generation system: a knowledge base that can read and write. By pre-annotating the input text with entity embeddings from Quicksilver, our self-updating knowledge base, our model can learn to keep better track of who’s who.

In Top Gun, Maverick begins a romantic relationship with his flight instructor Charlie, not Goose his copilot. But perhaps we should treat this as a deeper truth and a glimpse at the future. As machine understanding becomes ever more sophisticated, we humans will come to rely on machines that reveal truths deeper than we can see.