Tracing traffic through our stack

Wednesday, 12 May 2010

The @twitterapi has had two authentication mechanisms for quite a while now: HTTP Basic Authentication and OAuth. Basic authentication has gotten us so far! You can even use curl from the command line to interact with our API and simply pass a username and a password as a -u command line parameter when calling statuses/update to tweet, for example. However, as times have changed, so have our requirements around authentication — developers will need to take action. Basic Auth support is going away on June 30, 2010. OAuth has always been part of Twitter’s blood, and soon, we’re going to be using it exclusively. OAuth has many benefits for end users (e.g. protection of their passwords and fine grained control over applications), but what does it mean for Twitter on the engineering front? Quite a lot.

Our authentication stack, right now, for basic auth, looks as so:

decode the Authorization header that comes in via the HTTP request;
check any rate limits that apply for the user or the IP address that request came from (a memcache hit);
see if the authorization header is in memcache - and if it is, use it to find the user in cache and verify that the password is correct. If neither the header is in cache, nor the user is in cache, nor the password is correct (in case the user has changed his or her password), then keep going;
pull the user out of storage;
verify the user hasn’t been locked out of the system; and
verify the user’s credentials.

Our stack then also logs a lot of information to scribe about that user and login to help us counter harmful activities (whether malicious or simply buggy) — but, the one thing that we don’t have any visibility into, when using basic authentication, is what application is doing all this.

To verify an OAuth-signed request, we go through a lot more intensive (both computationally and on our storage systems):

decode the Authorization header;
validate that the oauth_nonce and the oauth_timestamp pair that were passed in are not present in memcache — if so, then this may be a relay attack, and deny the user access;
use the oauth_consumer_key and the oauth_token from the header, look up both the Twitter application and the user’s access token object from cache and fallback to the database if necessary. If, for some reason, neither can be retrieved, then something has gone wrong and proactively deny access;
with the application and the access token, verify the oauth_signature. If it doesn’t match, then reject the request; and
check any rate limits that may apply for the user at this stage

Of course, for all the reject paths up top, we log information — that’s invaluable data for us to turn over to our Trust & Safety team. If the user manages to authenticate, however, then we too have a wealth of information! We can, at this point, for every authenticated call, tie an user and an application to a specific action on our platform.

For us, and the entire Twitter ecosystem, its really important to be able to identify, and get visibility into, our users’ traffic. We want to be able to help developers if their software is malfunctioning, and we want to be able to make educated guesses as to whether traffic is malicious or not. And, if everything is functioning normally, then we can use this data to help us provision and plan for growth better and deliver better reliability. But, if all applications are simply using usernames and passwords as their identifiers, then we have no way to distinguish who is sending what traffic on behalf of which users.

Phase one of our plan is to remove basic authentication for calls that require authentication — those calls will migrate to a three-legged OAuth scheme. After that, we’ll start migrating all calls to at least begin to use a two-legged OAuth scheme. We also have OAuth 2 in the works. Start firing up dev.twitter.com and creating Twitter applications!

As always, the @twitterapi team is here to help out. Just make sure to join the Twitter Development Talk group to ask for questions, follow @twitterapi for announcements, and skim through our docs on dev.twitter.com to help you through this transition.

—@raffi