Infrastructure

Surviving the Brazilian Reality TV Surge that Pushed Twitter to the Edge

There are many reasons for Twitter’s significant traffic increase in 2020. The most notable is the conversation around COVID-19. One of the biggest surprises?

Big Brother Brasil

This Tweet is unavailable
This Tweet is unavailable.
This Tweet is unavailable.

The Brazilian reality TV show is so popular, it recently set a Guinness World Record for the most online votes for a television program. On Twitter, Big Brother Brasil (BBB) is a phenomenon. In 2019, BBB Season 19 generated more than 23.2 million Tweets globally. This year, for Season 20: 271.4 million Tweets, an over 10x increase in just one year. 

At one point in the season, however, the South American traffic peaked at over 3x the available capacity for the region -- almost taking Twitter down on the entire continent. 

With broadcasts seven days a week -- including major spikes on “voting days” (Sundays and Tuesdays), BBB was putting a severe strain on Twitter’s infrastructure. In order to survive, we had to tap into a combination of traffic steering, network ingenuity, and the public cloud. 

A key part of how we support our global traffic is with Twitter’s edge network. In 2014, we added a point of presence (POP) in South America, capable of supporting regional traffic. In 2020, as we prepared for a site refresh that would increase our capacity, the plans were put on hold for COVID-19. And then came the Big Brother Brasil series. 

To support this 3x surge in the region - with no user downtime or experience degradation - we transparently distributed traffic to other locations on our network and to the public cloud. We utilized North American POPs to handle some of the additional load, at the cost of introducing latency. 

This Tweet is unavailable
This Tweet is unavailable.

This graph shows, in requests-per-second (RPS), the approximate South American traffic during the biggest BBB peak on April 25th, 2020. Here, we see actual South American user traffic (the purple line) exceeding both the local South American capacity (lower horizontal) and the augmented capacity when routing some traffic into North America (upper horizontal). 

The combined traffic peaked at more than 3x what our regional POP supported. We needed more support, which is why we moved a large amount of regional traffic to Google (Cloud Load Balancing) and Amazon (CloudFront). These public cloud edges proxy the traffic back to our data centers. We used data generated by our Control Tower infrastructure to determine how to best map users or request-types. 

At peak BBB voting time, we served more traffic from Google and Amazon than our entire South American infrastructure could handle on its own.

Even without BBB, we see similar patterns worldwide, particularly in EMEA. Amazon CloudFront and Google Cloud Load Balancing have been essential for Twitter continuity worldwide. Steering clients to connect first to our 3rd-party providers before connecting to Twitter is key. These partnerships have brought us elastic capacity during surges and allow us to run Twitter more reliably at scale.

Thank you to all the Twitter engineers who have helped to make this possible, and to our partners at Amazon and Google.

This Tweet is unavailable
This Tweet is unavailable.
@dschonbe

Daniel Schonberg

‎@dschonbe‎

Senior Manager Software Engineering

@Todd_Segal

Todd Segal

‎@Todd_Segal‎

Senior Staff Software Engineer

Only on Twitter