Entity extracted
Phase transition detected

_Technology

_Intelligence Engines

Our products are built on top of a core set of computational engines.

Their architecture is modular by design, allowing for continuous development on our analytic pipeline. These engines allow our customers to process a diverse set of document types across multiple languages. They do the work of extracting information, identifying key insights, performing analysis at scale, and generating output as human-readable text and graphics.

_Engine 01

Structure

The first step in knowledge extraction is to identify all of the entities and structural data within a set of documents: the people, places, concepts, numbers, sentiment and quotes. A series of custom classifiers extract and resolve those entities and store them in a knowledge base. We then identify relationships between pairs of entities using unsupervised methodologies. Every piece of data that we capture retains its provenance, giving us full transparency on the decisions made by downstream algorithms.

EXAMPLE

Almost

modifier

57,000

number

Model S Vehicles

units

_Engine 02

Ensemble

We construct models of reality based on streams of millions of documents. By de-duplicating and reconciling statements made by multiple observers, we create an ensemble version of the corpus. For any given event, there can be thousands of varying descriptions, from the people involved to the tiniest details. Taking a multi-document approach allows us to capture this variation as signal rather than noise. The multi-document approach improves performance metrics of the structuring engine compared with single document approaches.

EXAMPLE

doc_367

The tally of Rohingya who fled Myanmar into Bangladesh soared to over 300, 000 refugees

doc_612

At least 313,000 Rohingya have flooded into Bangladesh since August 25

doc_149

AI-Hussein said that more than 270,000 Rohingya refugees had fled to Bangladesh

_Engine 03

Event

This engine looks for evidence of real-world events based on a set of documents. It analyzes a set of structured data extracted from the documents. It is then able to cluster together entity relationships as a function of time. The result is a time-directed graph of inferred real-world events from any given corpus.

EXAMPLE

Apple teams up with China's WeChat to accept payments

Date:

August 29, 2017

Geo:

Beijing, China

Volume:

64 documents

_Engine 04

Context

Information is best understood in context with all the other information around it. The context engine can be used to analyze any claim, fact or assertion and identify any supporting evidence or any contradictions and return these to the user to better contextualize the information. On a larger scale, the context engine allows us to connect together events based on an inferred chain of probable causality. This allows us to see how a set of events is connected and evolves through time, and to additionally enable us to identify the origin and spread of information over time.

EXAMPLE

9 Sep 2017

Google Play removes Iranian apps from its store

25 Aug 2017

Authorities say Apple shuts down Iranian apps

25 Jul 2017

House passes sanctions bill against Russia, Iran, and North Karea

_Engine 05

Difference

Differences between sets of information can be meaningful. These differences can be detected at multiple levels of resolution: sentence, document, and corpus. At the sentence level, a change in a single key word in a regulatory filing can be surfaced. At the document level, by diffing on structural data such as entities and factual claims, the engine can detect consensus and contradiction. Applying the diff engine across languages allows us to see events that are being covered by one country and not another.

EXAMPLE

Russian only

Russian & English

Unmanned vehicles will drive on roads without intersections

Medvedev promised to increase subsidies to developers of unmanned vehicles

Yandex Unveils Self-Driving Car Project

_Engine 06

Story

The most efficient means of communicating a complex analysis is through the combination of natural language narrative and graphics: a story. Our engines generate millions of statistical observations about entities and their relationships. We use a Bayesian model of surprise to rank these observations. The story emerges from a massive reduction in the dimensionality of these data and text generation via extractive and abstractive summarization. We are able to handle English, Chinese, and Russian as both input and output, with more languages on the way.

flag-english flag-chinese flag-russian

_Careers

Check out our current openings. We would love to meet you!

View All Openings