The Twitter Engineering Blog

Information from Twitter's engineering team about our technology, tools and events.

Results from Engineering for: April 2010

Memcached SPOF Mystery

At Twitter, we use memcached to speed up page loads and alleviate database load. We have many memcached hosts. To make our system robust, our memcached clients use consistent hashing and enable the auto_eject_hosts option. With this many hosts and this kind of configuration, one would assume that it won’t be noticeable if one memcached host goes down, right?

Read more...

Hadoop at Twitter

My name is Kevin Weil and I’m a member of the analytics team at Twitter. We’re collectively responsible for Twitter’s data warehousing, for building out an analysis platform that lets us easily and efficiently run large calculations over the Twitter dataset, and ultimately for turning that data into something actionable that helps the business. We’re fortunate to work with great people from teams across the organization for the latter.

Read more...

Introducing Gizzard, a framework for creating distributed datastores

Many modern web sites need fast access to an amount of information so large that it cannot be efficiently stored on a single computer. A good way to deal with this problem is to “shard” that information; that is, store it across multiple computers instead of on just one.

Read more...

Timeboxing

When you build a service that talks to external components, you have to worry about the amount of time that a network call will make. The standard technique for protecting yourself is to use timeouts.

Most network libraries let you set timeouts to protect yourself but for local computation there are few tools to help you.

Timeboxing is the name we’ve given to a technique for setting timeouts on local computation. The name is borrowed from the handy organizational technique.

Read more...