Developing with Gnip’s Full-Archive Search API for Twitter

Tuesday, 10 November 2015

Gnip is Twitter’s enterprise API platform that provides access to the full set of public historical and real-time Tweets. Hundreds of companies have built products (and in fact entire businesses) on the Gnip APIs that help their clients perform market research, evaluate the viability of advertising campaigns and make better business decisions that go beyond traditional marketing approaches.

In this post, we’ll talk about one specific product from Gnip that was released earlier this year: the Full-Archive Search API. With this historical search API, you can query as far back as nine years across the entire corpus of public historical Tweets on any keyword, hashtag or user mention. Additionally, there is the option to pull back Tweet volumes over time. We cover these two features below, and the screenshots and code samples are from our open source code sample.

The Gnip query syntax: PowerTrack

Imagine you’re a product manager at a social listening company looking to build a tool to help a prospective client better understand conversations on Twitter around various content management solutions. You might search for #Wordpress, #Joomla, and #Drupal separately to get a sense for conversation volume. Using these inputs, our sample code would show a chart that looks like this:

The volume of conversations indicates that there’s more activity for WordPress, so that would be a good place for the client to then focus their attention. From here, you might want to find and engage with influencers on Twitter. The Gnip Full-Archive Search API offers a powerful filtering language called PowerTrack that lets you write a query that returns Tweets from verified users who have Tweeted about WordPress:

#Wordpress is:verified -is:retweet

Alternatively, if you wanted to find customers who were close to your client’s offices in San Francisco, the Gnip API allows you to target Tweets based on geolocation:

#Wordpress point_radius:[-122.4167 37.7833 25.0mi] -is:retweet

Paginating Tweets via the API

The Gnip Full-Archive Search API can also return Tweets over a given timeframe, paginated by 500 Tweets per batch. To return Tweets for #Wordpress, the endpoint looks like this:

https://search.gnip.com/accounts/YOUR_ACCOUNT/search/YOUR_SERVICE.json?query=#Wordpress&maxResults=200&publisher=twitter

This returns a JSON response that looks like this (edited for brevity):

{  
   "next":"dPLQKDqRdqFYFqMfz7Xo0Vyzx6jBaN3z/sR2hCDbpBFR6eLXGwiRFzhO2F/l8SwifUrOA9f2Oy7IA6ax2eDXUMod5UPQoh/1+qpq2WSM+G5noVp1MTe3NIYorc6b+5RJCVleR4BfG7qqyGT9t+YxYHBMEpV5py7L5BU4rMT8mNcTJuTcZ7BkJjs7BbE31BtscirM28ofT5dSVlM93lSKyQ5eB/EwXKu7uStHXVNhV6NHTxFc5hyfnxh5knRfgDfX4QGwJHLtx8dfty/uWeg1+k+U9iv88Cesv6bFqxJ+EszmHW266n36D5K6ylod4rx1xNKfazBS3mO95zfufgFuMrNaG9TL6/YShM5DvUstg4hgpmAysW0hwQ==",
   "results":[  
{  
     "id":"tag:search.twitter.com,2005:428274432654598144",
     "objectType":"activity",
     "body":"Looking for a new web CMS? Download our buying guide to learn the 6 things to look for when making your selection.  http://t.co/IQ7GGBvAF7",
     "object":{  
     	...
 	}
 	...
}

   ]
}

There are two things to note here:

A “next” token that allows pagination through the set of results
A “results” array of the Tweets about your topic

To make this even easier, you can use our standard Python library to help simplify the API calls.

Getting Tweet volume data

In the original example, we showed a chart of Tweet volume about #Wordpress. The Gnip Full-Archive Search API has an option to return Tweet count data for the last nine years. This can be returned over an interval of either a day, an hour or a minute. In the screenshot above, the search granularity is by hour over one week.

To use this API, the code is simply:

payload = {
            'publisher' : 'twitter',
            'query'     : ‘#Wordpress -(is:retweet)’,
            'bucket'    : ‘day’,
            'fromDate'  : ‘201502120000’,
            ‘toDate’  : ‘201502260000’,
        }
        
        # Make the Gnip request
        url = "https://search.gnip.com/accounts/YOUR_ACCOUNT/search/YOUR_SERVICE/counts.json" 
        r = requests.get(url, params=payload, auth=(self.user, self.password))

The return value is a simple JSON array with counts per interval:

[
	{'count': 64, 'timePeriod': '201502130000'}, 
	{'count': 136, 'timePeriod': '201502130100'}, 
	{'count': 118, 'timePeriod': '201502130200'}, 
	{'count': 91, 'timePeriod': '201502130300'}, 
	...
	{'count': 95, 'timePeriod': ‘201502260000’}, 
]

As seen earlier, the Tweet count is useful for quickly gauging conversation volume about a topic and knowing which opportunities to seek further.

Through Gnip, developers have access to Twitter’s trove of public historical Tweets, conversations and media in what amounts to the world’s largest focus group. With this information, businesses can power their own industry research and outreach campaigns or build products that offer business insights to their own customers as in the example above. Contact our team to see if using Gnip makes sense for your use case, and then download the sample code to see the power and potential user experiences made possible by the Gnip Full-Archive Search API.