We language-annotated nearly 200k Tweets from 2014 in 68 languages, being careful to select them in a way that allows you to measure recall and precision well in order to evaluate and improve our language identification performance. You can download all the annotated Tweets.
We just shipped a new version of the Twitter app with a brand new search experience that blends the most relevant content - Tweets, user accounts, images, news, related searches, and more - into a single stream of results. This is a major shift from how we have previously partitioned results by type (for instance, Tweet search vs. people search). We think this simplified experience makes it easier to find great content on Twitter using your mobile device.
MapReduce is a programming model for processing large data sets, typically used to do distributed computing on clusters of commodity computers. With large amount of processing power at hand, it’s very tempting to solve problems by brute force. However, we often combine clever sampling techniques with the power of MapReduce to extend its utility.
Twitter is an amazing real-time information dissemination platform. We’ve seen events of historical importance such as the Arab Spring unfold via Tweets. We even know that Twitter is faster than earthquakes! However, can we more scientifically characterize the real-time nature of Twitter?