Changes to following caps and message ordering for Site and User Streams

By ‎@kurrik‎

Our Site Streams feed was introduced as a limited beta in August 2010. It is a powerful service which allows an application to read a stream of Tweets and social events for a set of authenticating users. With the inclusion of an optional with=followings parameter, Tweets and social events from all the users the connected user is following are also streamed.

We’re working to bring the Site Streams API closer to its public release. To do so, we will be making a couple of changes. In practice, neither change will affect more than a small portion of our connections. Moving forward, they should be considered a best practice and are now covered in the streaming documentation.

First, the following graph size will be capped at 10,000 accounts for each connected user. This change will apply to both Site Streams and User Streams. If your application connects on behalf of a user who follows more than 10,000 accounts, the followings list for the connected user will be truncated and this message will be sent over the stream:

{
  "warning": {
    "code": "FOLLOWS_OVER_LIMIT",
    "message": "The requested user follows more accounts than the maximum supported by this streaming endpoint. Only a subset of 10000 followed accounts are included in this stream.",
    "user_id": <user_id>
  }
}

The connected user’s Tweets, @replies, and social events for favorites and retweets will always be streamed. However, the 10,000 accounts that will be included are a random subset of the accounts the connected user follows. Any with=followings connections will only stream content from users in the truncated list. The IDs delivered via the Control Streams friends/ids.json endpoint will also only include IDs from users in the truncated list. If your application requires a full list of followings, please resort to the REST API.

The second change affects our Control Streams implementation: consumers of Site Streams should start processing the stream right away instead of waiting for a Control Stream message to be sent. This change is necessary due to the service scaling (particularly across datacenters). In the past, the Control Stream message was the first message delivered when connecting to a Site Stream, and it indicated an ID which could be used to change the configuration of the stream. It’s no longer feasible to make sure it is the first message delivered in every case, so applications should start processing the stream immediately instead of waiting for a Control Stream message to appear.

We will enable the following cap on March 22, 2013 - one week from today. The out-of-order Control Streams messages will be rare, but may happen today. As always, please use this discussion thread if you have any questions or comments about the change.

~Arne