Cassandra at Twitter Today

Saturday, 10 July 2010

In the past year, we’ve been working with the Apache Cassandra open source distributed database. Much of our work there has been out in the open, since we’re big proponents of open source software. Unfortunately, lately we’ve been less involved in the community because of more pressing concerns and have created some misunderstandings.

We’re using Cassandra in production for a bunch of things at Twitter. A few examples: Our geo team uses it to store and query their database of places of interest. The research team uses it to store the results of data mining done over our entire user base. Those results then feed into things like @toptweets and local trends. Our analytics, operations and infrastructure teams are working on a system that uses cassandra for large-scale real time analytics for use both internally and externally.

For now, we’re not working on using Cassandra as a store for Tweets. This is a change in strategy. Instead we’re going to continue to maintain our existing Mysql-based storage. We believe that this isn’t the time to make large scale migration to a new technology. We will focus our Cassandra work on new projects that we wouldn’t be able to ship without a large-scale data store.

We’re investing in Cassandra every day. It’ll be with us for a long time and our usage of it will only grow.