Infrastructure

Rebuilding Twitter’s public API

Today we launched the new Twitter API v2. Our first launch of a public API was in 2006 and shortly after, we began building API access to new features with the intention of opening our platform and inviting developers to build the future with us. Six years after the first launch, in 2012, we released the v1.1 API that introduced new requirements and stricter policies needed to curb abuse and protect the Twitter platform. 

Today’s launch marks the most significant rebuild of our public API since 2012. It’s built to deliver new features, faster, and to better serve the diverse set of developers who build on Twitter. It’s also built to incorporate many of our experiences and lessons learned over the past fourteen years of operating public APIs. We’d like to show you how we thought about designing and building this from the ground up.

This Tweet is unavailable
This Tweet is unavailable.

 Establishing goals

This Tweet is unavailable
This Tweet is unavailable.

The public Twitter API v1.1 endpoints are currently implemented by a large set of HTTP microservices, a decision we made as part of our re-architecture from a Ruby monolith. While the microservices approach enabled increased development speeds at first, it also resulted in a scattered and disjointed Twitter API as independent teams designed and built endpoints for their specific use cases with little coordination. For the new Twitter API v2, we knew we needed a new architecture that could more easily scale with the large number of API endpoints to serve our planned and new functionality going forward. As part of this design process, we drafted the following goals:

  • Abstraction: Enable Twitter engineers building the Twitter API to focus on querying, mutating, or subscribing to only the data they care about, without needing to worry about the infrastructure and operations of running a production HTTP service.
  • Ownership: Contain core and common API logic in a single place, owned by a single team.
  • Consistency: Provide a consistent experience for external developers by relying on our API design principles to reinforce uniformity.

With the above goals in mind, we’ve built a common platform to host all of our new Twitter API endpoints. To operate this multi-tenant platform at scale, we had to minimize any endpoint specific business logic, otherwise the system would quickly become unmaintainable. A powerful data access layer that emphasized declarative queries over imperative code was crucial to this strategy. 

This Tweet is unavailable
This Tweet is unavailable.

Unified data access layer

This Tweet is unavailable
This Tweet is unavailable.

Around this same time, representatives from teams building Twitter for web, iOS, and Android began migrating from individual internal REST endpoints to a unified GraphQL service. Our team followed suit as we realized that the data querying needs of the public Twitter API are similar to the needs of our Twitter mobile and desktop clients. Put another way, Twitter clients query for data and render UIs, while the public Twitter APIs query for data and render JSON responses. 

This Tweet is unavailable
This Tweet is unavailable.

A bonus from consolidating our data querying through a single interface is that the Twitter API can now easily deliver new Twitter features by querying for GraphQL data already being directly used by our consumer apps. When considering exposing GraphQL directly to external developers, we opted for a design most familiar to a broad set of developers in the form of a REST API. This model also makes it easier to protect against unexpected query complexity so we can ensure a reliable service for all developers.

This Tweet is unavailable
This Tweet is unavailable.

Componentizing the API platform

This Tweet is unavailable
This Tweet is unavailable.

With the platform approach decided, we needed a way for different teams to build and contribute to the overall API. To facilitate this, we designed the following three components:

  1. Routes to represent the external HTTP endpoints e.g. /2/tweets
  2. Selections to represent the ways to find resources e.g. "Tweet lookup by id". To implement a selection, create a GraphQL query which returns one or more resources
  3. Resources to represent the core resources in our system e.g. Tweets and users. To implement a resource, create a directory for every resource field which contains a GraphQL query to fetch the data for that specific field e.g. Tweet/text

Using these three components to construct a directory structure, teams can independently own and contribute different parts of the overall Twitter API while still returning uniform representations in responses. For example, here's a subset of our selections and resources directories:

This Tweet is unavailable
This Tweet is unavailable.
├── selections
│   └── tweet
│       ├── id
│       │   ├── Selection.scala
│       │   ├── selection.graphql
│       ├── multi_ids
│       │   ├── Selection.scala
│       │   ├── selection.graphql
│       ├── search
│       │   ├── Selection.scala
│       │   ├── selection.graphql
├── resources
│   ├── tweet
│   │   ├── id
│   │   │   ├── Field.scala
│   │   │   └── fragment.graphql
│   │   ├── author_id
│   │   │   ├── Field.scala
│   │   │   └── fragment.graphql
│   │   ├── text
│   │   │   ├── Field.scala
│   │   │   └── fragment.graphql

GraphQL plays a key role in this architecture. We can utilize GraphQL fragments as the unit of our rendering reuse (in a similar way to React Relay). For example, the GraphQL queries below all use a "platform_tweet" fragment which is a fragment created by combining all the customer requested fields in the /resources/tweet directory:

This Tweet is unavailable
This Tweet is unavailable.

https://api.twitter.com/2/tweets/20
Selection: /selections/tweet/id/selection.graphql

This Tweet is unavailable
This Tweet is unavailable.

query TweetById($id: String!) {
   tweet_by_rest_id(rest_id: $id) {
       ...platform_tweet
   }
}

https://api.twitter.com/2/tweets?ids=20,21
Selection: /selections/tweet/multi_ids/selection.graphql

This Tweet is unavailable
This Tweet is unavailable.

query TweetsByIds($ids: [String!]!) {
   tweets_by_rest_ids(rest_ids: $ids) {
       ...platform_tweet
   }
}

https://api.twitter.com/2/tweets/search/recent?query=%23DogsofTwitter
Selection: /selections/tweet/search/selection.graphql

This Tweet is unavailable
This Tweet is unavailable.

query TweetsBySearch($query: String!, $start_time: String, $end_time: String, ...) {
   search_query(query: $query) {
       matched_tweets(from_date: $start_time, to_date: $end_time, ...) {
           tweets {
               ...platform_tweet
           }
           next_token
       }
   }
}

Putting it all together

This Tweet is unavailable
This Tweet is unavailable.

At this point in the story, you may be curious where endpoint-specific business logic actually lives. We offer two options:

  1. When an endpoint’s business logic can be represented in StratoQL (the language used by Twitter’s data catalog system known as Strato which powers the GraphQL schema), then we only need to write a function in StratoQL without requiring a separate service. 
  2. Otherwise, the business logic is contained in a Finatra Thrift microservice written in Scala, exposed by a Thrift Strato Column.

With the platform providing the common needs for all HTTP endpoints, new routes and resources can be released without spinning up any new HTTP services. We can ensure uniformity through the platform by standardizing how a Tweet is rendered or how a set of Tweets are paginated regardless of the actual endpoint used for retrieval.  Additionally, if an endpoint can be constructed from queries for already existing data in the GraphQL schema, or if they're able to implement their logic in StratoQL, then we can not only bypass almost all "service owning" responsibilities but also deliver faster access to new Twitter features!

One aspect of the platform that has been top of mind since the beginning is the importance of serving the health of the public conversation and protecting the personal data of people using Twitter. The new platform takes a strong stance on where related business logic should live by pushing all security and privacy related logic to backend services. The result is that the API layer is agnostic to this logic and privacy decisions are applied uniformly across all of the Twitter clients and the API. By isolating where these decisions are made, we can limit inconsistent data exposure so that what you see in the iOS app will be the same as what you get from programmatic querying through the API.

This is the start of our journey and our work is far from done. We have many more existing v1.1 endpoints to migrate and improve, and entirely new public endpoints to build. We know developers want the ability to interact with all of the different features in the Twitter app and we’re excited for you to see how we’ve leveraged this platform approach to do just that.

We can’t wait to bring more features to the new Twitter API! To see more about our plans, check out our Guide to the future of the new API

 

</fin>

 

This Tweet is unavailable
This Tweet is unavailable.

Jenny Qiu Hylbert

‎@jqiu‎

Senior Engineering Manager

Steve Cosenza

‎@scosenza‎

Senior Staff Engineer