Introducing new metadata for Tweets

Wednesday, 13 February 2013

We’ll soon be adding new fields to Tweet structures returned by the API, helping developers more easily work with targeted subsets of Tweet collections.

The new lang attribute specifies the language the Tweet was written in, as identified by Twitter’s machine language detection algorithms. The values will be valid BCP 47 language identifiers, and may represent any of the languages listed on Twitter’s advanced search page, or “und” if no language could be detected. This field enables consumers of Tweet data, such as analytics services or real-time search streams, to offer language-specific curation, aggregation and analysis of Tweet content.

The new streaming-only filter_level attribute is intended for applications which display a selection of Tweets from a stream. Its values may be one of “none”, “low”, or “medium”, with a reserved “high” classification for future use. The “medium” (and eventually “high”) entries will roughly correlate to the “Top Tweets” results for searches on twitter.com. This will allow applications to more easily surface certain types of content from otherwise noisy or high-volume feeds.

Both attributes will be available on Streaming API responses, and lang will also be available on REST. We expect to turn on the filter_level attribute first, on Wednesday, February 20. The lang attribute will follow shortly after. Please keep an eye on the calendar of API changes for updates regarding the availability of these attributes.

Once deployed, you will see these new attributes present on the top level of Tweet status objects:

"status": {
    "created_at": "Tue Oct 30 21:12:37 +0000 2012",
    "id": 263387958047027200,
    "id_str": "263387958047027200",
    "text": "Better late than never, statuses/retweets_of_me is joining the API v1.1 method roster: https://t.co/jYz3MJnb ^TS",
    "geo": null,
    "coordinates": null,
    "place": null,
    "filter_level": "medium",
    "lang": "en",
    ...
}

Additionally, public streaming endpoints will support two new parameters that will provide Twitter-side filtering of streamed data based off of these attributes.

Connecting to a public stream and specifying the language parameter with a comma-separated list of languages will only return Tweets that have been detected as being written in the specified languages. For example, connecting with language=en will only stream Tweets detected to be in the English language.

Connecting to a public stream and specifying the filter_level parameter of one of “none”, “low”, or “medium” will define the level of filtering applied to the stream, where “none” corresponds to no filtering, and “medium” corresponds to the most filtering. The default value for filter_level will be none.

We hope these additions will help you provide great and appropriate Twitter content to your users. As always, if you have any questions or comments please follow up in this discussion thread.