All About Data

A blog about data, stats and programming

Spatial analysis of tweets - collecting tweets from a region

I thought it might be interesting to look at Twitter data from a spatial point of view. It might be possible to visualise how tweets from one area change during a week, or to detect clusters of activity in certain areas. 
The first task is to collect tweets, the Twitter Firehose streaming API is an endpoint through which a sample (at most 1%) of all tweets can be collected as they are generated. By applying a filter to the stream it is possible to restrict the tweets we get to a geographic region. 

The geeky stuff
The easiest  way to collect Firehose tweets in Python is through a library called Twython . This modules makes  Python objects available to us through which the Twitter API can be accessed. This is a REST API which means that an application communicates with it through HTTP commands. 
For twython to work it has to authenticate using OAuth. Twython handles most of this for us but we still need to supply it with authentication tokens such as application key, application secret, OAuth token and OAuth token secret (in the code they are: APP_KEY ,APP_SECRET,OAUTH_TOKEN ,OAUTH_TOKEN_SECRET) The easiest way to do that for our purposes is to register a Twitter app and on that app’s page grab the authentication codes. To register an app go here and sign-in in the top right corner, you will be taken back to this page but where you signed in you will now see your avatar. Hover over the avatar and select ”My Applications’ from the menu, and click on ‘Create a new application’ on that screen. The form asks for a callback URL but this is not required. Congratulation you just registered an application with twitter through which you can call the API. The screen looks like this:
The tokens have to be pasted into the code below for the script to run. 
Now that twython is good to go a filter needs to be applied to the firehose stream so that we only get tweets from a geographic region. This is done through the ‘locations’ filter. It takes pairs of coordinates (up to 25) and  all tweets within the polygon defined by the diagonal of these two points are sent down the firehose.
This is how it looks:

Tools such as this one  show the coordinates of a point selected on a map.
These coordinates correspond roughly to the market on Brick Lane.
I developed this in iPython and you can find the code here.
When the script runs it looks like this:
  • 29 July 2013