We’re about to release a new version of the Twitter text processing library we’re using for auto linking and extraction of usernames, lists & hashtags. This change will now extract URLs that have no specified protocols.
Concretely, it will add
http:// to the beginning of no-protocol URLs if:
/(i.e. t.co/, bit.ly/)
Here is the very simplified version of the Regex, based on the one in twitter-text-java:
(?: SUBDOMAIN+ DOMAIN ccTLD) | (?: SUBDOMAIN* DOMAIN gTLD) | (?: DOMAIN ccTLD (?=/) )
The new twitter-text version will be be published on GitHub in a couple of days:
As previously indicated, all URLs regardless of length will be wrapped by t.co on October 10, 2011. On that date, we’ll also begin wrapping URLs without specified protocols. To help prepare you for this near eventuality, we’re considering adding this new linking strategy to the two opt-in developer features we introduced a month ago:
wrap_links=trueparameter to the POST statuses/update and POST direct_messages/new.
That way you could simulate how URLs without protocol linking and t.co URLs wrapping will look like on October 10. Please send us your feedback on this idea through this discussion thread.
As always, if you have questions about t.co links wrapping or the twitter-text update, please let us know on our Developers Discussions board.