As much as we’d love to blame yesterday’s outage on scaling issues, hardware, exponential growth, rogue bots, aberrant behavior, or even our brief stint on Japanese television last night, we can’t. Having achieved a strong position with regard to scaling Twitter, we felt comfortable enough to begin optimizing lots of smaller parts of our application for maximum efficiency. We were so focused on those smaller details that we lost track of the bigger picture and the site was unresponsive for lots of folks throughout the day.
What went wrong? We checked in code to provide more accurate pagination, to better distribute and optimize our messaging system—basically we just kept tweaking when we should have called it a day. Details are great but getting too caught up in them is a mistake. I’ve been CEO of Twitter for two months now and this an awesome lesson learned. We’re seeing the bigger picture and Twitter is back. Please contact us if something isn’t working right (with Twitter that is).
Did someone say … cookies?