Twitter was an early adopter of the microservices approach to application design, and engineering investments into projects like Aurora, Finagle, and Finatra have let these microservices flourish while delivering the reliability and flexibility we depend on. However, bringing up a novel service is always a new endeavor. Even after it's been designed and implemented, it's hard to predict how a new system, or a major change to an existing system, will behave at scale. Problems can arise with the application itself, or its relationship to other applications and shared resources.
This is where Iago comes in—functional testing at scale—and we're excited to announce a major update to the open-source Iago repository. It's been a while since we've written about Iago, so I'll spend a minute on what it is and how it can help you.
Taste varies, but I prefer my technical blog posts with a little flavor. To that end, I wanted a suitable metaphor to describe the concept of functional testing at scale. After talking to a close friend, who is a passionate chef, we came to the conclusion that spinning up a new service and testing a new recipe for a restaurant menu are not terribly different. A number of challenges face any new dish: can the ingredients be reliably sourced? How much will it cost en masse? How much prep work is involved? What's the pick-up-time, from ingredients to a diner's mouth?
Similarly, a new service may be well-designed and tested, including (but not limited to) unit tests, integration tests, and performance testing or profiling, but running at scale is often very different. Network effects with other services and backends, hotkey or banding issues, and long-tail behavior are just a few examples. Iago provides Twitter a flexible and configurable way to generate load at scale, even with complex behavior, and the tools to measure the results.
Iago is a framework for running distributed, functional load tests with minimal friction. Event sources and clients for your service are defined in a few lines of code, and custom transports can be dropped in if the existing clients (including HTTP, Kafka, Thrift, and any Finagle-based transport) aren't enough. Rich metrics and logging are available out of the box thanks to Finagle and TwitterServer, and clusters can scale from a single instance up to generating multiples of the entire Twitter front-end traffic.
The first open source version can be traced back to 2012. Its philosophy and high-level design has remained mostly intact since then. Our site-wide load testing system still uses Iago, but in a fully-automated fashion these days. In order to make that happen, over the years some major updates were needed under the hood.
The Iago launcher, which deploys Iago to various runtime environments, was re-written. The configurations for launching a load test are now fully specified as command-line flags, meaning the same test binary can be built once and deployed across many different configurations and scenarios. Iago now runs on TwitterServer with all its goodies such as rich metrics, runtime tunables, and an extensible HTTP admin interface. Blocking code has been removed and the main event loop has been rewritten. It is more performant, while being easier to build and deploy.
That sounds well and good, but the proof is in the pudding, as they say. How does Twitter actually use Iago?
As mentioned above, we run automated functional stress tests against our site on a regular basis. These generate many millions of requests per second against the various site surfaces (front-end, REST, and GraphQL APIs), and are reliable enough that there is no human element involved the vast majority of the time.
The World Cup is something Twitter looks forward to every four years—both as fans, and as a significant source of load on our site’s infrastructure. This year, we were able to enjoy every match with confidence knowing that we had thoroughly tested the entire site in advance with Iago. Automated Iago executions on a regular schedule meant we were able iterate at a rapid pace through the debugging and troubleshooting process, making changes and immediately testing them at a site-wide scale. As a result, we rolled through World Cup 2018 with no notable incidents.
We also use Iago in many smaller instances, with compact tests suites for individual services. Many of these are run as part of the standard CI/CD workflow. Functional load testing is especially useful for new services that don't yet have production traffic. In order to achieve robust compliance with GDPR, a new set of data protection and privacy laws for EU citizens, we needed to spin up many new services in a short amount of time. Iago was critical in validating that these new services would be able to perform their duties reliably well before any production traffic was available.
For this post my friend volunteered the story of trying to bring a recipe she was very familiar with, a traditional Southern drop biscuit, to the brunch menu at a modern restaurant in the Upper East Side. Despite being both simple and delicious to prepare at home, it needed some major tweaks to survive in a restaurant kitchen. Shortening is verboten in NYC restaurants, and so a seemingly-simple butter substitution was necessary. Unfortunately, this significantly altered the shape and texture of these classic biscuits. In the end, a totally new cooking technique was needed to get them in the right shape while maintaining their soft, creamy insides and slightly crispy exterior. The final results were, in her words, "Perfect, fluffy little clouds of butter that sold out every weekend."
We see the same story play out with new services, or refactors of existing services, all the time at Twitter. Don't wait until you need your service to deliver production traffic to test it at scale. Give Iago a shot and let us know how we can improve it!
Iago has a long history and as such there is still plenty of room for improvement. Some planned changes include decoupling the discovery layer from Zookeeper, support for a wider variety of input event sources out of the box, and further performance profiling. In addition, we'd love to add more transport protocols depending on what open-source users' interest. Please give us some feedback on the Github issues section.
The Twitter testing team is also planning future work around tighter integration with some other open-source testing tools including Diffy. Keep an eye on that space if automated regression testing whets your appetite.
We’d like to thank all the contributors mentioned in the original engineering blog post, as well as the long-term maintainers over the years including James Waldrop, Tom Howland, and Kyle Laplante. The updates featured in this post were primarily authored by Adam Crane. Thanks to Ying Ni, Amol Patil, and Remy DeCausemaker for helping to draft this blog and move these changes into the open-source domain.