Infrastructure

How we scaled Reads On the Twitter Users Database

By and
Thursday, 23 February 2023

Introduction

Twitter operates one of the world’s largest User Reservation Systems, referred to as URS. The User Reservation System was initially built using Gizzard, an old MySQL framework, which was quite popular and performant during the time it was built, with bespoke features like quorum reads, which were unique to URS. With time, progress in technology and increase in scale, it was a challenge to adhere to the strict SLOs on QPS (Queries Per Second), latency, success rates and cross data center consistency while supporting main features of the application. With the increased scale, reducing maintenance costs was yet another challenge. This motivated us to explore other ways to address the problem. MySQL being widely used at Twitter for other applications, seemed to be the obvious choice because of its simplicity and performance.

These MySQL servers at Twitter are mostly on-premise, commodity hardware, customized and optimized for our specific use-cases. While we could add several replicas for supporting redundancy, it was not a feasible solution to spin up several replica servers to scale the reads to millions of queries per second. So we used Vitess, which is an open source database solution for scaling MySql. While Vitess is mostly used for sharding and scaling writes, we also leveraged the Vtgate component to scale the reads.

What is Vtgate and how did we scale it

Vtgate is a Vitess component. It is a stateless proxy used to route traffic to the correct Vttablet and return consolidated results to the application. Each MySQL instance is paired with a Vttablet process, which provides features like connection pooling, query rewriting, and query deduplication. 

The Vtgates can run on the same machine as MySQL instances but we moved them out of the MySQL servers mainly for 2 reasons: 

  • It would use resources which could be freed up for MySql
  • We wanted to scale them up to hundreds to achieve the read scalability


Twitter uses Apache Mesos which provides a scalable platform for running containerized applications. Since the Vtgates are stateless proxies, we spinned up the Vtgates on Aurora Mesos and tuned the number of Mesos instances, resources like CPU, number of OS threads and GOGC values to achieve the high rate of millions of queries per second. The number of connections on each Vtgate were tested to scale upto a couple of thousand connections. This met all our stringent requirements for the read scalability of the database.

This post is unavailable
This post is unavailable.

Why we chose Vitess

We chose Vitess because it is open source! Also, it integrates well with MySql. It also comes with a topology service to store all configuration data. At Twitter we have highly available Zookeeper clusters which we used for the topology service. It can be easily integrated with Orchestrator (VTORC in later releases), the MySQL replication topology manager which removes a lot of cluster maintenance overhead and provides everything we need for a highly available MySQL cluster. Plus, if required, we can shard the clusters to scale the writes as well.

Security

Encryption-in-Transit is a strict requirement at Twitter. Vitess allows encryption between the application and Vtgate, and between all its components. To avoid downtime during enabling TLS, we pushed an optional TLS feature to Open source Vitess which was used while enabling TLS. We found that Vitess was using cert only for client verification, so we updated the open source Vitess to use chain and ensured Vitess used full chain for client verification.

Conclusion 

The URS cluster is a tier 1 application running in production for several months now with an extremely high availability for both writes and reads. While we use several vanilla MySQl clusters to serve many critical applications, we found Vitess to be good for scaling and will recommend this to the industry.

Acknowledgements 

We would like to thank Alex Lamaison, Daksh Anand, Dmitry Borodaenko, Evgenii Seliavka, Gary Edgar, Gurkan Oluc, Jojo Antonio, Mikhail Bezoyan, Sai Gopal and Sargurunathan Murugesan for their valued contributions and guidance.

This post is unavailable
This post is unavailable.