Adding Pylint to IntelliJ

A switch back to programming in Python, meant I wanted to be able to use Pylint from inside IntelliJ.

I’ve been programming in JavaScript for the last few months, and had IntelliJ IDEA using ESLint to give me context highlighting. It was really easy to setup and I found it useful in helping me stay compliant with our coding style, which differs slightly from my personal style.

I’ve switch back to Python for another project and wanted to use Pylint from within IntelliJ IDEA, for a similar purpose. It turns out that you can’t, at least, you can’t have editor context highlighting. Which is a bit shit.

Yes, IntelliJ IDEA already has pretty good PEP8 context highlighting, but not everyone on the team uses IntelliJ IDEA, let alone an IDE (don’t ask). So I really wanted to get Pylint working in the IDE, so I didn’t have to keep dropping to a terminal every time I wanted to check it.

While IntelliJ IDEA wont do editor context highlighting with Pylint, you can add it as an external tool. This allows it to execute from within the IDE and provide links to any lines that have issues.

I started by following the instructions for PyCharm in the Pylint documentation. The Python plugin for IntelliJ IDEA, provide pretty much the same functionality as PyCharm, so I figured it should work. I couldn’t get it to work though.

Next I found this translation of a Russian blog, but again, I couldn’t quite get it to work. So I started fiddling with all the macros that are available until I found those that worked. So here you go, Pylint from within IntelliJ IDEA and picking up the correct virtualenv.

Start by opening the Preferences, (⌘, on a Mac) and browsing to the Tools > External Tools menu. Select the little symbol from the bottom of the main panel (or ⌘N on a Mac). Fill out the form with similar values to what’s below:

IntelliJ Preferences Edit Tool

Pylint – The name it will appear under the list of external tools.
A Python source code analyzer which looks for programming errors, helps enforcing a coding standard and sniffs for some code smells. – Not sure where this is actually used, but when you edit, you can at least be reminded of why you added it.
$PyInterpreterDirectory$/pylint – Using the $PyInterpreterDirectory$ macro means that it’ll pick up the correct vitrualenv
-rn -f parseable $FilePath$-rn means we only want the messages, not the other report gubbings. -f parseable means that it produces something that IntelliJ can parse. $FilePath$ is the full path to the current file.
Working directory
$PyInterpreterDirectory$ – tells Pylint to run from the vitrualenv bin folder, rather than the IntelliJ project folder.

I also unchecked the Main menu and Project views options in the Show in section. Mainly as I only wanted the ability to right click within the currently open file and lint it. To lint whole folders, or the entire project, you’ll need to modify both Parameters and Working directory fields with the correct macros.

This was enough to get it working, but without hyperlinking the any lines with issues. To enable that, click on the Output filter… button and add a new filter:

IntelliJ Preferences Edit Tool Edit Filter

When you right click within a source file, you should now have a pylint option under the External Tools sub menu. Which should give you clickable links in the output.

IntelliJ Pylint Output


There is only one downside to the configuration outlined above. If you have a .pylintrc file in the root of your project, Pylint can’t find it. This is due to the working directory being set to $PyInterpreterDirectory$, rather than a macro that represents the source root.

Pylint has a specific search order for the .pylintrc file. So you can either use something like the PYLINTRC environment variable. Or, you could just add the location of the file in the Parameters field of the External Tools dialog:

-rn -f parseable --rcfile $ContentRoot$/.pylintrc $FilePath$

That did the trick for me. This means you can have your .pylintrc file under source code control, which is useful if different projects have different requirements.

Making sure IntelliJ uses the correct virtualenv

There is one extra step that you need to be aware of if you want this to work if you are using virtualenv. By default IntelliJ IDEA knows nothing about your virtualenv, so you need to add it as a new SDK for the project. Open the Project Structure dialog (⌘; on a Mac), and add the virtualenv as a new Python SDK. The n you should be up and away.

Data Visualisation: Getting Your Untappd Checkins

It goes without saying really, that if you want to visualise data, you need some data. As I mentioned in my last post, I have an Untappd API key, so have access to a data set that I’m quite interested in exploring. The following code isn’t an all singing, all dancing solution to getting hold of your Untappd checkins, it’s far too rough and ready for that. It does serve as a starting point though, we need data, this Python script gets us that data, we can come back later and improve it.

This isn’t the first Python script I’ve written, but it is the longest and most complicated, which gives an idea of just how much I’ve played with Python. To enable it to run you need to modify the script with your Untappd API access keys and the username of the Untappd user you want to get checkins for. You’ll also need a MongoDB instance, if it’s not running on the default port, then you’ll also need to modify the bit that creates the MongoDB client so it knows which port to use etc.

from pymongo import MongoClient
import requests
# Your Untappd details...
untappd_user = ''
untappd_client_id = ''
untappd_client_secret = ''
# Connect to the local MongoDB instance...
client = MongoClient()
db = client[untappd_user]
# Does the user have any checkins already...?
if 'checkins' in db.collection_names():
print 'Dropping previously slurped checkins...'
# Create a new collection so we can slurp checkins into it...
checkins = db.create_collection('checkins')
# We don't have any checkin info at the moment, so don't set the checkin max_id
max_id = None
# Connect to Untappd and pull down some checkins...
while True:
# These are the parameters we send every time...
parameters = {'client_id': untappd_client_id, 'client_secret': untappd_client_secret, 'limit': 50}
# Each time we go round the loop apply the max_id...
if max_id != None:
parameters['max_id'] = max_id
# Get some checkins...
r = requests.get('' + untappd_user, params=parameters)
json = r.json()
if json['meta']['code'] == 200:
# Update the max_id...
max_id = json['response']['pagination']['max_id']
# Load the checkins into mongo...
# If we didn't get 50 checkins then we're done, so break out...
count = json['response']['checkins']['count']
print "Inserting %i checkins into mongo..." % count
if count < 50:
print json['meta']['error_detail']
print "%s now has %i Untappd checkins in MongoDB..." % (untappd_user, checkins.count())
view raw hosted with ❤ by GitHub

So what could we improve on? The main thing would be to not throw away all the checkins we’ve already managed to add to the MongoDB each time the script is run, it should really just get those checkins that the user has made since the last run of the script. There is also no error handling, so if you run out of Untappd API calls, you’re limited to 100 per hour, it doesn’t handle the error response and inform you.

You can find all the code of this series of blogs in one of my GitHub repositories.

Data Visualisation

I’ve been meaning to write a bit about data visualisation for the last few months, but to be honest, brewing beer is far more fun to do and write about. Beer is something that is quite close to my heart, I love the stuff, it’s the best drink in the world as far as I’m concerned. You might be wondering why I’m going on about beer, when I’m supposed to be talking about data visualisation though. It just happens that I use a website/mobile app called Untappd, to log what beer I drink and where and when I drank it. It also so happens that Untappd have a public API for interacting with their database, so I have a readily accessible dataset that I’m intimately familiar with.

I had a half hearted fiddle with the dataset of my beer drinking habits at the turn of the year, but I didn’t really do it properly, or to the extent I wanted to. I made a load of bubble graphs of various things, like which breweries had I drunk most beers from, that sort of thing. There wasn’t really any in depth analysis of when I drink beer, or how my beer drinking habits have changed since I started using the service though.

I’ve decided it’s about time to have a proper go at it and to learn a bit of Python while we’re at it. There will be a number of posts after this dealing with extracting the data with the Untappd API, mining the corpus to produce usable data sets and finally about how to visualise those sets. The posts will come when they come, hopefully there wont be too much of a gap between them.


Evidently it’s a good idea to test your Hadoop MapReduce functions on a small subset of data with Hadoop running in standalone mode. If you are new to Hadoop and feeling your way, like I am, this makes perfect sense, as you get to practice with the map and reduce functions without having to worry about setting up a cluster of nodes. It also gives you the opportunity to send all sorts of stuff to stdout, so you can find out what’s in all the Hadoop API classes; ReflectionToStringBuilder is your friend in this case.

One thing you have to do before invoking Hadoop though, is to set the classpath so that it can find your newly compiled classes. This is pretty trivial if you don’t use any third party libraries:

# Assuming you are seting this from the same folder as you
# are building your code with Maven...
export HADOOP_CLASSPATH=./target/classes

When you start adding third party libraries however, it’s not as simple. If you choose wisely, then they may already be included in the Hadoop installation, for example Apache Commons Lang 2.x. If like me, you’ve moved onto Apache Commons Lang 3.x then you have to include the JAR on the HADOOP_CLASSPATH so that it can be picked up and used. If you are using a lot of third party libs, you would be a fool to try and manage this by hand.

If you are using Maven as your build tool, then you can use the Maven Dependency Plugin to copy all your thrid party JARs to a suitable location for inclusion on the classpath. Just make sure you have included and excluded the correct dependancy scopes, otherwise you’ll have a bucket full of JARs that you don’t need in your chosen location.

Then it’s just a case of modifying the classpath to also point to the folder that contains all your third party JARs and away you go.

# Assuming you are seting this from the same folder as you
# are building your code with Maven and have put all your
# 3rd paty JARs in target/libs...
export HADOOP_CLASSPATH=./target/classes:./target/libs/*
view raw hosted with ❤ by GitHub

I have to confess that when I realised that I needed to create a classpath with all the third party JARs on it, I wondered if I could do some bash scripting to iterate over the folder and produce a classpath that way. Glad I did a google first, as I’d totally forgotten about using the wildcard on a classpath, as there’s not really much call for that kind of thing when writing webapps…

Copying The Right Dependencies With The Maven Dependency Plugin

I’ve been playing with Hadoop recently and ran into an issue with the Maven dependency plugin copying all the JAR files from all the scopes into my lib folder. No problem I thought, you can exclude scoped dependencies with the excludeScope configuration parameter, so I set that to provided but this still left the test dependencies being copied. As you can’t set two excludeScope elements and the one element you can set only takes a single scope, this is a bit of an issue.

It turns out that if you want to exclude dependencies from both the test and provided scopes, you need to exclude the provided scope and include the runtime scope. So your plugin snippet becomes something like:


This means that your lib folder isn’t polluted with your test JARs like JUnit, Hamcrest and Mockito but more importantly without all of the Hadoop dependencies. Which all means that your Hadoop standalone mode classpath for testing out those MapReduce jobs isn’t full of unnecessary clutter.

Extracting Audio Tracks From MKV Files

I’ve been ripping all my DVDs to MKV files, so I can serve them up via XBMC running on a Raspberry Pi. I was up in the loft at the weekend, mainly trying to sort out the mess into something more logical. I suddenly remembered that I had a number of music DVDs that I bought with the express intention of ripping the audio from, so I dug them out.

My normal DVD ripping process is to use MakeMKV to get a raw MKV file and then use Handbrake to re-encode the contents it into something slightly smaller. I want to be able to listen to these DVD tracks on a mobile player as well as at work, so I don’t just want them as MKV files with video and audio, I want a separate audio file as well. One of the features of Handbrake, is the ability to add more audio tracks with different encodings etc, so it’s really easy to have your DD or DTS passed through and have it re-encoded into FLAC, Vorbis or MP3 on a separate track.

Armed with a MKV file that contains a FLAC/Vorbis/MP3 audio track, the only thing left is to extract this track into a separate file. If you are using Linux or Windows, then there are a couple of command line utilities that make doing this quite easy, they’re part of the MKVToolNix package, which you should be able to get out of your distributions repositories or download from here. I had four DVDs to extract audio tracks from, one of the DVDs had two different live concerts, so you can imagine there was a fair few files to extract audio from once each song had been ripped to its own MKV file.

It would have taken a while to manually extract the audio track from each MKV file, so I needed a way to script it, so it could all be done automatically. Firstly, I needed to find out the track number that the audio was on, I used mkvmerge for this:

~/scrap/music/mkv/Franz Ferdinand/Live at Brixton$ mkvmerge -i 08\ -\ I\'m\ Your\ Villain.mkv
File '08 - I'm Your Villain.mkv': container: Matroska
Track ID 1: video (V_MPEG4/ISO/AVC)
Track ID 2: audio (A_AC3)
Track ID 3: audio (A_FLAC)
view raw mkvmerge hosted with ❤ by GitHub

All the MKV files I’d created had the FLAC audio track on ID 3, which made life slightly easier, so then it was just a case of finding all MKV files in the current folder tree and extracting track three from them. I used mkvextract in the following script:

find . -type f -name *.mkv | while read filename; do mkvextract tracks "$filename" 3:"${filename%.*}".flac; done
view raw hosted with ❤ by GitHub

It would be cool to call the script with the output format you want to extract (A_FLAC in this instance) and parse the output of mkvmerge for each MKV file to find the desired track id to pass to mkvextract. Doing it this way, with a fixed track id, was all a bit quick and dirty, but it worked for me. Maybe I’ll modifying it in the future, but as I’ve no more music DVDs needing audio ripped from them, I’m in no rush…

Binding Backbone Callbacks

There’s one thing about JavaScript that’s always been an issue for me and that’s remembering to bind callback methods to the correct scope. For some reason I always forget and then spend an age trying to figure out why things aren’t working correctly. I’m forever doing this kind of thing in a Backbone view:

events: {
'click a.enable': 'enableHandler',
'click a.disable': 'disableHandler'
initialize: function() {
this.model = new Model();
this.model.bind('change', this.render, this);
// ...
enableHandler: function(e) {
enabled: true
}, {
wait: true,
success: function() {
error: function() {
// ...
// ...
view raw NoBinding.js hosted with ❤ by GitHub

I’m then left wondering why all my enable/disable links, or similar, are being changed, instead of just the one that was clicked on. You’d think it would just fail, instead, the jQuery selector isn’t scoped to the Backbone view element and instead selects all matching elements in the document.

It’s really not that hard to wrapper the callbacks with the Underscore bind function:

events: {
'click a.enable': 'enableHandler',
'click a.disable': 'disableHandler'
initialize: function() {
this.model = new Model();
this.model.bind('change', this.render, this);
// ...
enableHandler: function(e) {
enabled: true
}, {
wait: true,
success: _.bind(function() {
}, this),
error: _.bind(function() {
// ...
}, this)
// ...
view raw Binding.js hosted with ❤ by GitHub

It would be nice if I could remember to do this automatically when writing a callback function. Given that I’m still forgetting to do it after years of programming JavaScript, I don’t hold out much hope that I’ll suddenly start remembering…