Twitter text library and t.co links wrapping update

Monday, 19 September 2011

We’re about to release a new version of the Twitter text processing library we’re using for auto linking and extraction of usernames, lists & hashtags. This change will now extract URLs that have no specified protocols.

Concretely, it will add http:// to the beginning of no-protocol URLs if:

  • Host name ends with gTLD (i.e. twitter.com)
  • Host name has 2 sub-domains followed by ccTLD (i.e. yahoo.co.jp, google.co.uk)
  • Host name consists of 1 sub-domain and ccTLD, which is followed by / (i.e. t.co/, bit.ly/)

Here is the very simplified version of the Regex, based on the one in twitter-text-java: (?: SUBDOMAIN+ DOMAIN ccTLD) | (?: SUBDOMAIN* DOMAIN gTLD) | (?: DOMAIN ccTLD (?=/) )

The new twitter-text version will be be published on GitHub in a couple of days:

As previously indicated, all URLs regardless of length will be wrapped by t.co on October 10, 2011. On that date, we’ll also begin wrapping URLs without specified protocols. To help prepare you for this near eventuality, we’re considering adding this new linking strategy to the two opt-in developer features we introduced a month ago:

  • Per-tweet basis: Using the wrap_links=true parameter to the POST statuses/update and POST direct_messages/new.
  • Application basis: Visiting your application settings and configuring this option globally for your application.

That way you could simulate how URLs without protocol linking and t.co URLs wrapping will look like on October 10. Please send us your feedback on this idea through this discussion thread.

As always, if you have questions about t.co links wrapping or the twitter-text update, please let us know on our Developers Discussions board.