Data Visualisation: Getting Your Untappd Checkins

It goes without saying really, that if you want to visualise data, you need some data. As I mentioned in my last post, I have an Untappd API key, so have access to a data set that I’m quite interested in exploring. The following code isn’t an all singing, all dancing solution to getting hold of your Untappd checkins, it’s far too rough and ready for that. It does serve as a starting point though, we need data, this Python script gets us that data, we can come back later and improve it.

This isn’t the first Python script I’ve written, but it is the longest and most complicated, which gives an idea of just how much I’ve played with Python. To enable it to run you need to modify the script with your Untappd API access keys and the username of the Untappd user you want to get checkins for. You’ll also need a MongoDB instance, if it’s not running on the default port, then you’ll also need to modify the bit that creates the MongoDB client so it knows which port to use etc.

from pymongo import MongoClient
import requests
# Your Untappd details...
untappd_user = ''
untappd_client_id = ''
untappd_client_secret = ''
# Connect to the local MongoDB instance...
client = MongoClient()
db = client[untappd_user]
# Does the user have any checkins already...?
if 'checkins' in db.collection_names():
print 'Dropping previously slurped checkins...'
db.drop_collection('checkins')
# Create a new collection so we can slurp checkins into it...
checkins = db.create_collection('checkins')
# We don't have any checkin info at the moment, so don't set the checkin max_id
max_id = None
# Connect to Untappd and pull down some checkins...
while True:
# These are the parameters we send every time...
parameters = {'client_id': untappd_client_id, 'client_secret': untappd_client_secret, 'limit': 50}
# Each time we go round the loop apply the max_id...
if max_id != None:
parameters['max_id'] = max_id
# Get some checkins...
r = requests.get('http://api.untappd.com/v4/user/checkins/' + untappd_user, params=parameters)
json = r.json()
if json['meta']['code'] == 200:
# Update the max_id...
max_id = json['response']['pagination']['max_id']
# Load the checkins into mongo...
checkins.insert(json['response']['checkins']['items'])
# If we didn't get 50 checkins then we're done, so break out...
count = json['response']['checkins']['count']
print "Inserting %i checkins into mongo..." % count
if count < 50:
break
else:
print json['meta']['error_detail']
break
print "%s now has %i Untappd checkins in MongoDB..." % (untappd_user, checkins.count())
view raw untappd.py hosted with ❤ by GitHub

So what could we improve on? The main thing would be to not throw away all the checkins we’ve already managed to add to the MongoDB each time the script is run, it should really just get those checkins that the user has made since the last run of the script. There is also no error handling, so if you run out of Untappd API calls, you’re limited to 100 per hour, it doesn’t handle the error response and inform you.

You can find all the code of this series of blogs in one of my GitHub repositories.

Data Visualisation

I’ve been meaning to write a bit about data visualisation for the last few months, but to be honest, brewing beer is far more fun to do and write about. Beer is something that is quite close to my heart, I love the stuff, it’s the best drink in the world as far as I’m concerned. You might be wondering why I’m going on about beer, when I’m supposed to be talking about data visualisation though. It just happens that I use a website/mobile app called Untappd, to log what beer I drink and where and when I drank it. It also so happens that Untappd have a public API for interacting with their database, so I have a readily accessible dataset that I’m intimately familiar with.

I had a half hearted fiddle with the dataset of my beer drinking habits at the turn of the year, but I didn’t really do it properly, or to the extent I wanted to. I made a load of bubble graphs of various things, like which breweries had I drunk most beers from, that sort of thing. There wasn’t really any in depth analysis of when I drink beer, or how my beer drinking habits have changed since I started using the service though.

I’ve decided it’s about time to have a proper go at it and to learn a bit of Python while we’re at it. There will be a number of posts after this dealing with extracting the data with the Untappd API, mining the corpus to produce usable data sets and finally about how to visualise those sets. The posts will come when they come, hopefully there wont be too much of a gap between them.