Adding Pylint to IntelliJ

A switch back to programming in Python, meant I wanted to be able to use Pylint from inside IntelliJ.

I’ve been programming in JavaScript for the last few months, and had IntelliJ IDEA using ESLint to give me context highlighting. It was really easy to setup and I found it useful in helping me stay compliant with our coding style, which differs slightly from my personal style.

I’ve switch back to Python for another project and wanted to use Pylint from within IntelliJ IDEA, for a similar purpose. It turns out that you can’t, at least, you can’t have editor context highlighting. Which is a bit shit.

Yes, IntelliJ IDEA already has pretty good PEP8 context highlighting, but not everyone on the team uses IntelliJ IDEA, let alone an IDE (don’t ask). So I really wanted to get Pylint working in the IDE, so I didn’t have to keep dropping to a terminal every time I wanted to check it.

While IntelliJ IDEA wont do editor context highlighting with Pylint, you can add it as an external tool. This allows it to execute from within the IDE and provide links to any lines that have issues.

I started by following the instructions for PyCharm in the Pylint documentation. The Python plugin for IntelliJ IDEA, provide pretty much the same functionality as PyCharm, so I figured it should work. I couldn’t get it to work though.

Next I found this translation of a Russian blog, but again, I couldn’t quite get it to work. So I started fiddling with all the macros that are available until I found those that worked. So here you go, Pylint from within IntelliJ IDEA and picking up the correct virtualenv.

Start by opening the Preferences, (⌘, on a Mac) and browsing to the Tools > External Tools menu. Select the little symbol from the bottom of the main panel (or ⌘N on a Mac). Fill out the form with similar values to what’s below:

IntelliJ Preferences Edit Tool

Name
Pylint – The name it will appear under the list of external tools.
Description
A Python source code analyzer which looks for programming errors, helps enforcing a coding standard and sniffs for some code smells. – Not sure where this is actually used, but when you edit, you can at least be reminded of why you added it.
Program
$PyInterpreterDirectory$/pylint – Using the $PyInterpreterDirectory$ macro means that it’ll pick up the correct vitrualenv
Parameters
-rn -f parseable $FilePath$-rn means we only want the messages, not the other report gubbings. -f parseable means that it produces something that IntelliJ can parse. $FilePath$ is the full path to the current file.
Working directory
$PyInterpreterDirectory$ – tells Pylint to run from the vitrualenv bin folder, rather than the IntelliJ project folder.

I also unchecked the Main menu and Project views options in the Show in section. Mainly as I only wanted the ability to right click within the currently open file and lint it. To lint whole folders, or the entire project, you’ll need to modify both Parameters and Working directory fields with the correct macros.

This was enough to get it working, but without hyperlinking the any lines with issues. To enable that, click on the Output filter… button and add a new filter:

IntelliJ Preferences Edit Tool Edit Filter

When you right click within a source file, you should now have a pylint option under the External Tools sub menu. Which should give you clickable links in the output.

IntelliJ Pylint Output

.pylintrc

There is only one downside to the configuration outlined above. If you have a .pylintrc file in the root of your project, Pylint can’t find it. This is due to the working directory being set to $PyInterpreterDirectory$, rather than a macro that represents the source root.

Pylint has a specific search order for the .pylintrc file. So you can either use something like the PYLINTRC environment variable. Or, you could just add the location of the file in the Parameters field of the External Tools dialog:

-rn -f parseable --rcfile $ContentRoot$/.pylintrc $FilePath$

That did the trick for me. This means you can have your .pylintrc file under source code control, which is useful if different projects have different requirements.

Making sure IntelliJ uses the correct virtualenv

There is one extra step that you need to be aware of if you want this to work if you are using virtualenv. By default IntelliJ IDEA knows nothing about your virtualenv, so you need to add it as a new SDK for the project. Open the Project Structure dialog (⌘; on a Mac), and add the virtualenv as a new Python SDK. The n you should be up and away.

Formatting a USB Flash Drive on Mac OS X

Fed up with Kubuntu on my laptop at home, I decided to try and install Mint via a USB Flash Drive.

Except, I couldn’t get either KDE Partition Manager on my laptop, or Disk Utility on my work MacBook to erase and format the flash drive. I wanted to do this, so that dd would have a pristine flash drive to burn the Mint ISO to.

Disk Utility was giving errors of the form:

Name invalid.
Operation invalid...

Turns out this was something to do with the fact the the USB Flash Drive already had a bootable Linux partition on it from the last Kubuntu install I did. Luckily I found this page on the Apple website, that contained the solution, you’ll need to scroll all the way to the Helpful answers section at the bottom to find the answer.

In short though, find the id of the USB flash drive using the Info button in Disk Utility, then open a Terminal and issue:

$ diskutil eraseVolume HFS+ NAME disk3
Started erase on disk3 Kubuntu 16.04 LT
Unmounting disk
Erasing
Initialized /dev/rdisk3 as a 7 GB case-insensitive HFS Plus volume
Mounting disk
Finished erase on disk3 NAME

Then pop back into Disk Utility and click the Erase button and format the disk to FAT32. You should now be able to use dd to copy the ISO to the flash drive.

Data Visualisation: Getting Your Untappd Checkins

It goes without saying really, that if you want to visualise data, you need some data. As I mentioned in my last post, I have an Untappd API key, so have access to a data set that I’m quite interested in exploring. The following code isn’t an all singing, all dancing solution to getting hold of your Untappd checkins, it’s far too rough and ready for that. It does serve as a starting point though, we need data, this Python script gets us that data, we can come back later and improve it.

This isn’t the first Python script I’ve written, but it is the longest and most complicated, which gives an idea of just how much I’ve played with Python. To enable it to run you need to modify the script with your Untappd API access keys and the username of the Untappd user you want to get checkins for. You’ll also need a MongoDB instance, if it’s not running on the default port, then you’ll also need to modify the bit that creates the MongoDB client so it knows which port to use etc.

from pymongo import MongoClient
import requests
# Your Untappd details...
untappd_user = ''
untappd_client_id = ''
untappd_client_secret = ''
# Connect to the local MongoDB instance...
client = MongoClient()
db = client[untappd_user]
# Does the user have any checkins already...?
if 'checkins' in db.collection_names():
print 'Dropping previously slurped checkins...'
db.drop_collection('checkins')
# Create a new collection so we can slurp checkins into it...
checkins = db.create_collection('checkins')
# We don't have any checkin info at the moment, so don't set the checkin max_id
max_id = None
# Connect to Untappd and pull down some checkins...
while True:
# These are the parameters we send every time...
parameters = {'client_id': untappd_client_id, 'client_secret': untappd_client_secret, 'limit': 50}
# Each time we go round the loop apply the max_id...
if max_id != None:
parameters['max_id'] = max_id
# Get some checkins...
r = requests.get('http://api.untappd.com/v4/user/checkins/' + untappd_user, params=parameters)
json = r.json()
if json['meta']['code'] == 200:
# Update the max_id...
max_id = json['response']['pagination']['max_id']
# Load the checkins into mongo...
checkins.insert(json['response']['checkins']['items'])
# If we didn't get 50 checkins then we're done, so break out...
count = json['response']['checkins']['count']
print "Inserting %i checkins into mongo..." % count
if count < 50:
break
else:
print json['meta']['error_detail']
break
print "%s now has %i Untappd checkins in MongoDB..." % (untappd_user, checkins.count())
view raw untappd.py hosted with ❤ by GitHub

So what could we improve on? The main thing would be to not throw away all the checkins we’ve already managed to add to the MongoDB each time the script is run, it should really just get those checkins that the user has made since the last run of the script. There is also no error handling, so if you run out of Untappd API calls, you’re limited to 100 per hour, it doesn’t handle the error response and inform you.

You can find all the code of this series of blogs in one of my GitHub repositories.

Data Visualisation

I’ve been meaning to write a bit about data visualisation for the last few months, but to be honest, brewing beer is far more fun to do and write about. Beer is something that is quite close to my heart, I love the stuff, it’s the best drink in the world as far as I’m concerned. You might be wondering why I’m going on about beer, when I’m supposed to be talking about data visualisation though. It just happens that I use a website/mobile app called Untappd, to log what beer I drink and where and when I drank it. It also so happens that Untappd have a public API for interacting with their database, so I have a readily accessible dataset that I’m intimately familiar with.

I had a half hearted fiddle with the dataset of my beer drinking habits at the turn of the year, but I didn’t really do it properly, or to the extent I wanted to. I made a load of bubble graphs of various things, like which breweries had I drunk most beers from, that sort of thing. There wasn’t really any in depth analysis of when I drink beer, or how my beer drinking habits have changed since I started using the service though.

I’ve decided it’s about time to have a proper go at it and to learn a bit of Python while we’re at it. There will be a number of posts after this dealing with extracting the data with the Untappd API, mining the corpus to produce usable data sets and finally about how to visualise those sets. The posts will come when they come, hopefully there wont be too much of a gap between them.

XBMC the Thecus N3200 NAS and NFS Redux

I’ve been meaning to post about this for ages, but for some reason I’ve always had something better to do, like washing my hair. As I detailed in my previous post about XBMC the Thecus N3200 NAS and NFS, I couldn’t get it working, as XBMC couldn’t mount the NFS shares and I couldn’t adjust the mounts via the Thecus N3200 OS. I gave up in the end and switched my XBMC distro from OpenELEC to Raspbmc mainly so I could get access to the underlying Debian OS and thus edit the /etc/fstab file.

Installing Raspbmc is a breeze, just download the install script, stuff your SD card into your computer, run the script and then go and enjoy a beer while it does its stuff. Then all that was left to do, was ssh into the Raspberry Pi and edit it’s copy of /etc/fstab and create some local folders to mount the shares to. As I’d previously done all of that on my laptop, it was just a case of copying the relevent bits from one /etc/fstab to the other. A reboot later and XBMC was happily adding all the media from what it thought was a local folder.

Just in case you find yourself in a similar position, here’s the entries in the /etc/fstab on my laptop:

# Thecus N3200 mount points, assuming your NAS has a fixed IP of 192.168.0.100
192.168.0.100:/raid/Bob /home/boba/nas/bob nfs rw
192.168.0.100:/raid/Photos /home/boba/nas/photos nfs rw
192.168.0.100:/raid/Music /home/boba/nas/music nfs rw
192.168.0.100:/raid/Video /home/boba/nas/video nfs rw
view raw mount-points.sh hosted with ❤ by GitHub

Setting The HADOOP_CLASSPATH

Evidently it’s a good idea to test your Hadoop MapReduce functions on a small subset of data with Hadoop running in standalone mode. If you are new to Hadoop and feeling your way, like I am, this makes perfect sense, as you get to practice with the map and reduce functions without having to worry about setting up a cluster of nodes. It also gives you the opportunity to send all sorts of stuff to stdout, so you can find out what’s in all the Hadoop API classes; ReflectionToStringBuilder is your friend in this case.

One thing you have to do before invoking Hadoop though, is to set the classpath so that it can find your newly compiled classes. This is pretty trivial if you don’t use any third party libraries:

# Assuming you are seting this from the same folder as you
# are building your code with Maven...
export HADOOP_CLASSPATH=./target/classes

When you start adding third party libraries however, it’s not as simple. If you choose wisely, then they may already be included in the Hadoop installation, for example Apache Commons Lang 2.x. If like me, you’ve moved onto Apache Commons Lang 3.x then you have to include the JAR on the HADOOP_CLASSPATH so that it can be picked up and used. If you are using a lot of third party libs, you would be a fool to try and manage this by hand.

If you are using Maven as your build tool, then you can use the Maven Dependency Plugin to copy all your thrid party JARs to a suitable location for inclusion on the classpath. Just make sure you have included and excluded the correct dependancy scopes, otherwise you’ll have a bucket full of JARs that you don’t need in your chosen location.

Then it’s just a case of modifying the classpath to also point to the folder that contains all your third party JARs and away you go.

# Assuming you are seting this from the same folder as you
# are building your code with Maven and have put all your
# 3rd paty JARs in target/libs...
export HADOOP_CLASSPATH=./target/classes:./target/libs/*
view raw third-party-libs.sh hosted with ❤ by GitHub

I have to confess that when I realised that I needed to create a classpath with all the third party JARs on it, I wondered if I could do some bash scripting to iterate over the folder and produce a classpath that way. Glad I did a google first, as I’d totally forgotten about using the wildcard on a classpath, as there’s not really much call for that kind of thing when writing webapps…

Copying The Right Dependencies With The Maven Dependency Plugin

I’ve been playing with Hadoop recently and ran into an issue with the Maven dependency plugin copying all the JAR files from all the scopes into my lib folder. No problem I thought, you can exclude scoped dependencies with the excludeScope configuration parameter, so I set that to provided but this still left the test dependencies being copied. As you can’t set two excludeScope elements and the one element you can set only takes a single scope, this is a bit of an issue.

It turns out that if you want to exclude dependencies from both the test and provided scopes, you need to exclude the provided scope and include the runtime scope. So your plugin snippet becomes something like:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.6</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
<includeScope>runtime</includeScope>
<excludeScope>provided</excludeScope>
</configuration>
</execution>
</executions>
</plugin>

This means that your lib folder isn’t polluted with your test JARs like JUnit, Hamcrest and Mockito but more importantly without all of the Hadoop dependencies. Which all means that your Hadoop standalone mode classpath for testing out those MapReduce jobs isn’t full of unnecessary clutter.