Murder: Fast datacenter code deploys using BitTorrent

Thursday, 15 July 2010

Twitter has thousands of servers. What makes having boatloads of servers particularly annoying though is that we need to quickly get multiple iterations of code and binaries onto all of them on a regular basis. We used to have a git-based deploy system where we’d just instruct our front-ends to download the latest code from our main git machine and serve that. Unfortunately, once we got past a few hundred servers, things got ugly. We recognized that this problem was not unlike many of the scaling problems we’ve had and dealt with in the past though—we were suffering the symptoms of a centralized system.

Slow deploys

By sitting beside a particularly vocal Release Engineer, I received first-hand experience of the frustration caused by slow deploys. We needed a way to dramatically speed things up. I thought of some quick hacks to get this fixed: maybe replicate the git repo or maybe shard it so everyone isn’t hitting the same thing at once. Most of these quasi-centralized solutions will still require re-replicating or re-sharding again in the near future though (especially at our growth).

It was time for something completely different, something decentralized, something more like…BitTorrent…running inside of our datacenter to quickly copy files around. Using the file-sharing protocol, we launched a side-project called Murder and after a few days (and especially nights) of nervous full-site tinkering, it turned a 40 minute deploy process into one that lasted just 12 seconds!

To the rescue

Murder (which by the way is the name for a flock of crows) is a combination of scripts written in Python and Ruby to easily deploy large binaries throughout your company’s datacenter(s). It takes advantage of the fact that the environment in a datacenter is somewhat different from regular internet connections: low-latency access to servers, high bandwidth, no NAT/Firewall issues, no ISP traffic shaping, only trusted peers, etc. This let us come up with a list of optimizations on top of BitTornado to make BitTorrent not only reasonable, but also effective on our internal network.

Since at the time we used Capistrano for signaling our servers to perform tasks, Murder also includes a Capistrano deploy strategy to make it easy for existing users of Capistrano to convert their file distribution to be decentralized. The final component is the work Matt Freels (@mf) did in bundling everything into an easy to install ruby gem. This further helped get Murder to be usable for more deploy tasks at Twitter.

Where to get it

Murder, like many internal Twitter systems, is fully open sourced for your contributions and usage at: http://github.com/lg/murder. I recently did a talk (see video below) at CUSEC 2010 in Montreal, Canada which explains many of the internals. If you have questions for how to use it, feel free to contact me or Matt on Twitter.

We’re always looking for talented Systems and Infrastructure engineers to help grow and scale our website. Murder is one of the many projects that really highlights how thinking about decentralized and distributed systems can make huge improvements. If Murder or these kinds of engineering challenges interest you, please visit our jobs page and apply. We’ve got loads of similar projects waiting for staffing. Thanks!

Twitter - Murder Bittorrent Deploy System from Larry Gadea on Vimeo.