Expecting the Unexpected


The @Gnip data science team recently drafted a new white paper on modeling events in social data.

People flock to social media for many different reasons. Thanks to the ease of sharing and potential for broad distribution, one of the strongest reasons people use Twitter is to discuss significant events while they’re occurring.

After observing the data from many of the cultural events on the Twitter platform over the years, @jrmontag and @DrSkippy made an attempt at categorizing them into a few distinct types in a recent whitepaper. In particular, they looked at how the different types of events are driven by the circumstances surrounding the event.

Expected Events
First, they observed that people come to the platform to discuss events they expect ahead of time. For example, they noted how Winter Storm Jonas was discussed over the days leading up to, during, and after the storm. In particular, they saw hourly counts of Tweets that included the term “jonas,” peaked near the time of the storm’s landfall.

Unexpected Network Spread Events
Next, they noticed what they’ve called “unexpected network spread”. This type of event occurs when an announcement is initially shared by one source (or a small number), but then shared, virally, throughout the social network. An example in this case could be a rumor or announcement about a celebrity or event. In this case, they saw the daily counts of Tweets mentioning an event start from nearly zero, and then surpass one million within days.

Unexpected Social Media Pulse Events
Finally, they observed situations where many observers witnessed some sort of event, all live. For example, imagine an earthquake experienced by a number of first-hand observers and shared on Twitter. This leads to an immediate surge in corresponding social media data, that they’ve named the “Social Media Pulse.” Using some simplifying mathematical assumptions, @jrmontag and @DrSkippy built a quantitative model that describes the shape of the Social Media Pulse.

The figure below illustrates an example Social Media Pulse. Shown are per-minute counts of Tweets mentioning “meteor” during a 2014 meteor shower. Also shown is the Social Media Pulse model fit to the observed data.

Tweets about meteor shower over time

With a Social Media Pulse model applied to observed data, one can calculate relevant metrics like an estimated time to the Pulse peak, or total expected Tweet volume. The Social Media Pulse model can take on a range of similar shapes, as shown in the figure below that compares three different earthquakes. The resulting fits can then be compared across multiple events to draw comparisons.

Tweets about three earthquakes compared

This model is just a start, but with it we have an opportunity to look for, and compare patterns observed on the platform. Through the use of analyses like these you can gain quantitative insights into real-time, real-world events, to better quantify the observations and conclusions made from social data streams.

When you notice patterns over time, you have a suspicion that there’s an underlying reason why they happen — and maybe we can encode that in a model so that with just a little bit of data we can predict what will happen.

Josh Montague@jrmontag

For more examples and details of the Social Media Pulse model, download the white paper. We’ve also included sample code in the GitHub repository that implements some of the algorithms discussed in the paper to help you get started today.