Have you found yourself waiting in line at a supermarket and the guy in front decides to pay by check? One person can slow everyone down. We had a similar problem at Twitter.
Every server has a fixed number of workers (cashiers) that handle incoming requests. During peak hours, we may get more simultaneous requests than available workers. We respond by putting those requests in a queue.
This is unnoticeable to users when the queue is short and we handle requests quickly, but large systems have outliers. Every so often a request will take unusually long, and everyone waiting behind that request suffers. Worse, if an individual worker’s line gets too long, we have to drop requests. You may be presented with an adorable whale just because you landed in the wrong queue at the wrong time.
A solution is to stop being a supermarket and start being Fry’s. When you checkout at Fry’s you wait in one long line. In front are thirty cash registers handling one person at a time. When a cashier finishes with a customer, they turn on a light above the register to signal they’re ready to handle the next one. It’s counterintuitive, but one long line can be more efficient than many short lines.
For a long Time, twitter.com ran on top of Apache and Mongrel, using mod_proxy_balancer. As in the supermarket, Apache would “push” requests to waiting mongrels for processing, using the ‘bybusyness’ method. Mongrels which had the least number of requests queued received the latest request. Unfortunately, when an incoming request was queued, the balancer process had no way of knowing how far along in each task the mongrels were. Apache would send requests randomly to servers when they were equally loaded, placing fast requests behind slow ones, increasing the latency of each request.
In November we started testing Unicorn, an exciting new Ruby server that takes the Mongrel core and adds Fry’s “pull” model. Mobile.twitter.com was our first app to run Unicorn behind Apache, and in January @netik ran the gauntlet to integrate Unicorn into the main Twitter.com infrastructure.
During initial tests, we predicted we would maintain CPU usage and only cut request latency 10-15%. Unicorn surprised us by dropping request latency 30% and significantly lowering CPU usage.
For automatic recovery and monitoring, we’d relied on monit for mongrel. Monit couldn’t introspect the memory and CPU within the Unicorn process tree, so we developed a new monitoring script, called Stormcloud, to kill Unicorns when they ran out of control. Monit would still monitor the master Unicorn process, but Stormcloud would watch over the Unicorns.
With monit, child death during request processing (due to process resource limits or abnormal termination) would cause that request and all requests queued in the mongrel to send 500 “robot” errors until the mongrel had been restarted by monit. Unicorn’s pull model prevents additional errors from firing as the remaining children can still process the incoming workload.
Since each Unicorn child is only working on one request at a time, a single error is thrown, allowing us to isolate a failure to an individual request (or sequence of requests). Recovery is fast, as Unicorn immediately restarts children that have died, unlike monit which would wait until the next check cycle.
We also took advantage of Unicorn’s zero-downtime deployment by writing a deploy script that would transition Twitter on to new versions of our code base while still accepting requests, ensuring that the new code base was verified and running before switching onto it. It’s a bit like changing the tires on your car while still driving, and it works beautifully.
Stay tuned for the implications.