Infrastructure

Building Twitter’s ad platform architecture for the future

By Revenue Platform

A great advantage of software systems is that they are extremely adaptable. There comes a point in the evolution of complex software systems, however, where that malleability hinders growth instead of facilitating it. At some point, software will approach a stage where it’s no longer serving its purpose — helping people. 

This was the case for the Twitter AdServer in early 2019. After 10 years of iterative development, the system was too inefficient to further evolve with the organization. When it started, we were an extremely small team of engineers, serving a single type of ad format (Promoted Tweets), generating around $28M of revenue. Today, Twitter’s Revenue organization consists of 10X more engineers and ~$3B of revenue, supporting multiple ad formats - Brand, Video, Cards. 

Low velocity for launching new products and tightly coupled cross team dependencies with high overhead costs added to the growing complexity of the organization. This demanded a fundamental change in order for us to scale any further.

This Tweet is unavailable
This Tweet is unavailable.

How did we serve ads?

The AdServer funnel consisted primarily of Admixer and Adshard. The Admixer served as the interface between the upstream clients and the AdServer pipeline. When Admixer received an ad request, it hydrated the request with additional user information before fanning out the request to the Adshards. Adshard operated under sharded architecture, with each Adshard responsible for a subset of the ads. Each Adshard ran the request through 3 main stages of the AdServer funnel:

  • Candidate Selection: Selected a set of active line items (i.e. a group of ads with the same targeting criteria; refer to line item definition for more details) for the user. In this phase, (1) we applied all standard campaign targeting to end up with eligible ads to serve to the user (e.g. Geo-location, Age, Interests, etc); (2) eliminated non-relevant ads that users may not like; (3) ensured we serve only active ads (e.g. only campaigns that are not paused). 
  • Product Logic: This stage expanded each line item from the candidate selection phase to a set of creatives based on some business rules. It also enriched these creatives with additional product features. (The creative is the actual ad that gets shown to the user, e.g. a promoted Tweet - refer to creative definition for more details).
  • Candidate Ranking: Ranked the ads that made it through the above stages. Every ad gets assigned a score that indicates the likelihood of a user engaging with it, based on which the ads are ranked. We use some real-time trained machine learning models and advertiser data to compute the scores which are used in our auction pipelines. 

Throughout this funnel, different components also attach metadata associated with an ad request and ad candidates, which is then written to our underlying key-value stores in AdMixer. This data is later consumed in our feedback loop by the analytics pipeline for billing, fraud detection and other analysis.

Historically, we optimized the system for minimal latency and operational overhead, by reducing network hops wherever possible. This led to a single service i.e. Adshard doing most of the heavy lifting job, resulting in a monolithic model.

The Entropy

The monolithic platform worked well when Twitter had only two ad products - Promoted Tweets and Promoted Accounts. However, as we scaled our business the monolith model posed more challenges than solutions. 

Adding a new ad product

This Tweet is unavailable
This Tweet is unavailable.

In the old Adserver, due to the challenges and complexity of the legacy code reusing the existing patterns and practices became the norm. The figure above is an example of adding a new ad product e.g. Promoted Trend in the old AdServer. Promoted Trend ads have the following characteristics:

1. Should always be selected by Geo == Country.

2. Should not require auction and can therefore skip the ranking stage.

Adding a new ad product often involved patchy work. Due to the nature of the existing framework and other legacy constraints, skipping the ranking stage was not a viable option, which led to a hacky workaround of adding product based conditional logic "if (product_type == 'PROMOTED_TREND') {...} else {..}" to the code in the ranking pipeline. Such product based logic also existed in the selection pipeline, making these stages tightly coupled and adding to the complexity of the growing spaghetti code. 

Development velocity

Following were some of the challenges that were shared across all the teams who worked on the monstrous legacy code.

  • Large bloated data structures: The request and response object size grew quickly with the evolving business logic. Since the request/response objects were shared across the 3 stages, immutability guarantees were challenging. Adding a new feature in the candidate ranking stage, required an understanding of where and how the needed fields for the feature were being set up in both the upstream (selection and creative stage) and downstream (Admixer). Making changes required an understanding of almost the entire serving pipeline. This was a daunting process, especially for the new engineers. 
  • Data Access Challenges: Historically, mostly for latency and resource optimizations, Admixer has always been the service responsible for fetching user related data. (Fetching the same user data in Adshard would require 25x RPC due to sharded architecture). Therefore, to use a new attribute in Adshard, we needed to add a corresponding user data fetcher in Admixer and send it across to Adshard. This process was time consuming and depending on the type of the user attribute, could have an impact on the performance of the AdServer. This process also made decoupling platforms from products very challenging.
  • Technical Debt: The complex legacy code added to our technical debt. Deprecating old fields and cleaning up unused code became increasingly risky. Oftentimes this would lead to unintended changes in functionality, introducing bugs and dragging down the overall productivity. 

Solution: How we came up with these services

These chronic engineering problems and the loss of developer productivity created a need for a change of paradigm in the design of our systems. We were lacking clear separation of concerns in our architecture, and had high coupling between different product areas. 

These problems are fairly common in the software industry, and breaking up monoliths into microservices is a popular approach to solve them. However, it comes with inherent trade-offs, and if designed hastily, can instead lead to reduced productivity. As an example, let’s conduct a thought exercise about one possible approach we could have taken with the decomposition of our service.

Thought exercise for Decomposition: An AdServer per Product

Since each product team was bottlenecked on a monolithic AdServer, and different products can require varying architectural requirements, we could choose to break up the single AdServer into N different AdServers, one for each product or group of similar products.

This Tweet is unavailable
This Tweet is unavailable.

In the architecture above, we have three different AdServers, for Video Ad Products, Takeover Ad Products, and one for Performance Ad Products. These are owned by the respective product engineering teams, and each of them can have their own codebase and deployment pipelines. This seems to  provide autonomy, and help with both separation of concerns and decoupling between different product areas, however, in reality, such a split would likely make things worse. 

Each product engineering team now has to staff up to maintain an entire AdServer. Each team has to maintain and run their own candidate generation and candidate ranking pipelines, even though they rarely modify them (these are often modified by Machine Learning domain experts). For those domain areas, things become even worse - shipping a new feature to be used in Ad Prediction would now require us to modify the code in three different services, instead of one! Finally, it’s hard to ensure that analytics data and logs from all AdServers can be joined with each other, to make sure downstream systems continue to work (analytics is a cross-cutting concern across products).

Learnings

We learned that just a decomposition is not enough. The per-product AdServers architecture we built above lacks both cohesion (each AdServer still does too many things) and re-usability (ad candidate ranking, for example, runs in all three services). This leads to an epiphany - while we need to provide autonomy to product engineering teams we must support them with horizontal platform components that can be re-usable across products! Having plug-and-play services for cross-cutting concerns can create a multiplier effect for engineering teams.

What we built : Horizontal Platform Components

As a result, we identified “general purpose ad tech functions” that could be used by most ad products directly. These were -

  • Candidate Selection: Given user attributes, identify the ad candidates interested in bidding for the user request.
  • Candidate Ranking: Given user attributes and ad candidates, score the ad candidates based on their relevance to the user.
  • Callback and Analytics: Enforce contracts for standardized collection of analytics data from all services responsible for ad serving.

We built services around each of these functions and reorganized ourselves into platform teams that each own one of them. The product AdServers from the previous architecture have now become much leaner components, that rely on the horizontal platform components and with product specific logic built on top of them.

This Tweet is unavailable
This Tweet is unavailable.

Reaped Benefits

This Tweet is unavailable
This Tweet is unavailable.

Adding a new product after decomposition 

Let's re-examine the problem presented above associated with spotlight ads and how the new architecture tackles it. By building different services for ad candidate selection and ad candidate ranking, we have better separation of concerns. It breaks the pattern of forcing our ad products through the 3 stage paradigm of the AdServer pipeline. Spotlight ads now have the flexibility to integrate with only the selection service, allowing these ads to skip the ranking stage. This allows us to get rid of hacky ways of bypassing ranking for promoted trend ads and achieve a cleaner and robust codebase. 

As we continue to grow our ad business, adding a new product will be as easy as plugging and playing these horizontal platform services as needed.  

Increased velocity

With well defined APIs we could establish a separation of responsibilities across teams. Modifying the candidate ranking pipeline does not require understanding the selection or creative stages. This is a win-win situation, where each team only has to understand and maintain their code, allowing them to move faster. This also makes troubleshooting errors easier because we can isolate the issues within a service and test them independently. 

Risks and tradeoffs

Such a paradigm shift in the way we serve ads at Twitter is bound to come with inherent risks and trade-offs. We want to list some of them as a word of caution for readers - to encourage that such trade-offs must be identified and acknowledged before deciding on a large refactor to existing systems.

  • Increased hardware cost: Creating many different services from a single service certainly meant increased compute cost to run those systems. In order to make sure that the increase is within an acceptable limit, we set ourselves a concrete goal of keeping the cost to operate the ads serving systems to be within 5% of the revenue we make from it. This helped us prioritize efficiency where needed, and make design decisions easy. The new architecture is approximately twice as expensive as the previous one in terms of compute resources, but this is within our acceptable limit.
  • Increased operational cost for product dev teams: Having several new services means engineering cost to maintain and operate them, and some of this new burden fell on product development teams (as opposed to it being more on the platform teams earlier). This means that product development teams need to grow appropriately to support their newly owned systems in addition to developing new features faster.
  • Temporary slowdown in development of new features: This effort required involvement from over 40 engineers, and engineering and product managers for 1.5 years. During this time, we estimate that it slowed down new feature development by about 15% (mostly in ad serving). This was a trade-off that the org leadership was willing to make in order to fund the project.
  • Increased complexity in ranking and auction dynamics: This was a technical consideration for the new architecture - with each product bidder serving its own request, we partially lost the ability to make global optimal decisions for ads ranking and auction at lower granularity. We compensated for this to some extent, by moving this logic to centralized platform services at higher granularity. 

We evaluated these risks, and made the decision that the benefits of the new architecture outweighed the impact of these risks. An overall increase in development velocity, and more predictable delivery of feature improvements is vital for the ambitious business goals that we have set for ourselves. The new architecture delivers that - a modular system that allows faster experimentation and reduces coupling. 

We have already started seeing the benefits of this decision:

  • No single team is the bottleneck for most planned projects, and more than 90% of the projects planned in ads serving can be executed.
  • Experimentation is much faster - online ranking experiments in the ads serving space are now being run 50% quicker.
This Tweet is unavailable
This Tweet is unavailable.

Migrations at Twitter scale

A daily deploy cadence with multiple teams pushing new code everyday, combined with the large scale in the order of hundreds of thousands of QPS, made decomposition of the AdServer challenging.

To start the migration, we adopted an in-memory API-first approach which allowed logical separation of the code. It also enabled us to run some initial system performance analysis to ensure the CPU and memory footprint delta between the old and the new systems were at par. This laid down the foundation for the horizontal platform services - which were essentially services created by refactoring code and re-arranging the packaging structure of the in-memory version. 

To ensure feature parity between the old and new services we developed a custom correctness evaluation framework. It replayed requests to both the legacy and new Ad Server to compare metrics between the two systems within an acceptable threshold. This was used in offline testing and gave us visibility into the performance of the new system. It helped us catch issues early on, preventing bugs from making it into production. 

After the code was shipped to production, we used an experiment framework which provides insights into the overall revenue metrics in production. Many prediction and auction related metrics require a longer feedback loop to remove noise and assess the true impact of a change. Thus, for a true end-to-end verification of the migration, we relied on this framework to guarantee on-par revenue metrics.

Conclusion

The Ad Server decomposition improved the state of our systems and reinforced the foundation of Twitter's ad business. It lets us focus our time and resources on solving real engineering problems instead of battling the woes of legacy infrastructure. With the evolution of the ad business and technology, more challenges are to come but we are positive and excited to build solutions that make our systems efficient. 

If you're interested in solving such challenges, consider joining the flock.

Acknowledgements

This project took commitment, cross-functional alignment and work from many teams. We would like to thank project leads who contributed to this blog: Girish Viswanathan, James Gao, Kavita Kanetkar, Rachna Chettri, Siddharth Rao, Tanooj Parekh and Yudian Zheng.

And others who worked on this project: Andrew Taeoalii, Andrew Wilcox, Annie Lin, Alicia Vartanian, Arun Viswanathan, Babatunde Fashola, Bhavdeep Sethi, Brian Kahrs, Corbin Betheldo, Catia Goncalves, Daniel Erenrich, Daniel Kang, Eddie Xie, Eitan Adler, Eric Chen, Hani Dawoud, Heather Chen, Ilho Ye, Irina Sch, Jean-Pascal Billaud, Julio Ng, Jiyuan Qian, Joe Xie, Juan Serrano, Kai Chen, Kai Zhu, Kevin Xing, Keji Ren, Mohammad Saiyad, Paul Burstein, Prabhanjan Belagodu, Praneeth Yenugutala, Ranjan Banerjee, Rembert Browne, Radhika Kasthuri, Ratheesh Vijayan, Rui Zhang, Sandy Strong, Siva Gurusamy, Smita Wadhwa, Siyao Zhu, Su Chang, Scott Chen, Srinivas Vadrevu, Tianshu Cheng, Troy Howard, Tushar Singh, Udit Chitalia, Vishal Ganwani, Vipul Marlecha, Wei Su, Wolf Arnold, Yanning Chen, Yong Wang, Yuemin Li, and Yuanjun Yang.  

We would also like to thank the Revenue SRE team, Revenue Product and Engineering leadership team, Revenue Strategy and Operations team, and Revenue QE team for their constant support.

This Tweet is unavailable
This Tweet is unavailable.