Creating Reproducible, Publication-Quality Plots With Matplotlib and Seaborn

Update: this post was created from a Jupyter notebook, which you can access here.

How should you create a plot for inclusion in a publication? A common workflow for Matlab or Python users—and one that I used to use myself—is to create a figure just using the defaults, export it as SVG, and open it Inkscape or Illustrator to make it look nice.

This works fine if you only need to edit how a figure looks once. However, this is almost never the case. As you iterate further on the paper, your advisor may ask you to generate the plot a slightly different way. Or, perhaps, you find an off-by-one error in your code and need to regenerate the figure with the correct results. However, having to go through the whole process of re-editing your figures in a vector graphics program can take a lot of time, and thus this added time cost may discourage you from regenerating figures (even when you really should).

However, there is another option, albeit with a higher startup cost. If you use Python, then Matplotlib actually exposes almost all the controls you need to make instantly reproducible, beautiful figures. The high startup cost is learning how to use those controls, which can take a lot of effort. However, I’d argue that this startup cost is entirely worth it. After having used Matplotlib exclusively for my figures since starting graduate school, I can now create a fully reproducible, publication-quality figure in about 10 minutes. In this blog post, I’ll walk you through the steps needed to go from Matplotlib’s defaults, to something useable in a publication.

Passing Quals!

A few weeks ago, on January 25th 2016, I passed my qualifying exams! This means that I am now qualified to write my thesis and get my PhD. Passing quals involved a lot of reading; during January I was reading pretty much 5 papers per day:

Hundreds of pages of readings.

Different people prepare for quals in different ways. One of my labmates prepared by writing hand-written notes in many small notebooks and then scanning them. Another labmate made a handful of slides per paper. For me, I decided to write blog posts on each paper. This approach worked well for me because it forced me to digest each paper by writing a summary, and then to critically think about each paper by writing a “takeaways” section. For some of the papers, I even wrote a quick demo in the Jupyter notebook to help me understand the model or algorithm better.

Quals was a really transformative experience for me. I got a ton out of reading all the papers on my reading lists even if it did mean I did nothing else for a month! During the course of preparing for quals I came up with several concrete project ideas that I’m really excited to implement, and I feel like I have a much deeper and nuanced understanding of the research areas that I’m interested in (i.e., mental simulation and how it relates to the rest of cognition and to algorithms in computer science and artificial intelligence).

If there are any other grad students reading this in the future, my advice is to really try to treat quals as an experience for you. It is not just a requirement you need to fulfill, but an opportunity for you to read deeply about the topics you are interested in. Reading so many papers in such a short amount of time will allow you to make connections between things you otherwise wouldn’t, simply because you will remember more details when there has only been a few hours in between reading two papers rather than a few days or weeks! I won’t lie: it is really tough to find the time to prepare for quals. I am very guilty of procrastinating on my own quals; students in my program are supposed to take them at the end of their 2nd year or 3rd year at the latest, but I am in my 4th year! However, I think the experience is really worth it, and I do wish that I had just bitten the bullet and gotten mine done sooner.

Next up: actually writing my thesis!

Deploying JupyterHub for Education

Over the last few months, I’ve been busily working on converting a class that I am a TA for from Matlab to Python. Actually, not just Python, but IPython/Jupyter notebooks! Part of this involved setting up a server that the students could log in to in order to complete their assignments. I wrote a post over on the Rackspace developer blog about it:

https://developer.rackspace.com/blog/deploying-jupyterhub-for-education/

I’ve also been busy developing a tool called nbgrader for grading IPython notebooks. More on this in a future post, hopefully!

How I Learned to Stop Worrying and Love PyCon

Ok, I’ll admit it. I was pretty nervous about going to PyCon. I was giving a talk, I was only going to know a couple of people there, I was going for an entire week, and it was going to be in a city where they speak a language of which I only know about two words. Also, I was a bit unsure of what to expect in terms of the social climate (especially given that the only other non-academic tech conference I’ve been to has been DEFCON, which is not exactly known for being low on sexism).

I shouldn’t have worried, though. PyCon was phenomenally awesome! I met a million amazing people and the talks were exceptionally well done. Everyone was incredibly friendly and outgoing, so much that I rarely found time not to talk to people! I have to say that PyCon is easily the best conference I’ve ever been to (even better than last year’s CogSci, which was also pretty great). Here’s a recap.

Installing 64-bit Panda3D for Python 2.7 on OS X

I use the Panda3D video game engine to develop experiments for my research. I needed to install a development version that included some bugfixes, but unfortunately, installing Panda3D on OSX is not the easiest task to accomplish. The development builds they provide are unfortunately only 32-bit, but I needed to be able to run my Panda3D code alongside libraries like NumPy, which I had installed as 64-bit (which is the default on OSX). For a while, I tried to get NumPy/SciPy/etc. installed for 32-bit, but failed, and ultimately was able to get Panda3D compiled for 64-bit Python 2.7. Here are the steps that I took in order to compile it; hopefully they will be useful to others (and at the very least, a reference for myself going forward!)

Rewriting Python Docstrings With a Metaclass

Update: I gave a talk based on this post at the November 2013 San Francisco Python meetup! Here are the slides and a video of the talk.

Today, I found myself in a situation where I had a few different classes inheriting from each other, e.g.:

Inheritance structure
1
2
3
4
5
6
7
8
9
class A(object):
    def foo(self):
      pass

class B(A):
    pass

class C(B):
    pass

Specifically, each of these classes was a test class that I was running using noseB and C had different setup methods than A, but otherwise ran the same test. However, nosetests -v doesn’t print out the name of the method’s class, only the docstring, which is of course the same for all three classes. This made it very difficult to tell which method was actually failing.

To resolve this, I wrote a metaclass to intercept each class at creation time and rewrite its docstrings to be prefixed with the name of the class. This was probabily overkill, but I’d been itching to play around with metaclasses for a while and decided this was a semi-valid excuse.

On Collecting Data

When collecting data, how do you save it?

There are about a million different options. In Python, you can choose from many different libraries:

  • pickle
  • numpy
  • yaml
  • json
  • csv
  • sql

… just to name a few. Over time I’m fairly certain that I’ve managed to save data (whether behavioral or simulation) in all of these formats. This is really inconsistent: it makes it difficult to know what encoding any particular dataset is in, let along the format of the data itself, and it means I end up writing and rewriting code to do saving, loading, parsing, etc., more times than I ought to.

Switching to Octopress

As you may be able to tell, the look of this site has drastically changed. I’ve been meaning to overhaul the theme for a while, and I decided I also wanted to try something different from Wordpress. In particular, I wanted more flexibility with embedding code, which Octopress seems to be particularly good at:

Hello, World!
1
2
3
4
def hello_world():
    print "Hello, World!"

hello_world()

Also, I just really like how Octopress blogs look by default. I’ve been fiddling with the theme on this today, and I’ll probably continue to fiddle with it going forward. Because Octopress serves static content, I feel like I have somewhat more control over the look and feel of my blog – and importantly, I can try out changes locally before deploying them, which was not something I could do with Wordpress.

Hurray for change and learning new tools!

Why Is Making a Git Commit So Complicated?

I’ve realized that I don’t blog very often because I tend to write very long and thorough posts. In an effort to try to start blogging more, I’m going to let myself off the hook some of the time and just write about something short. Perhaps this will get me more in the habit of writing, which will then lead to more in-depth posts!

Anyway, today in our lab meeting I gave a presentation on using git. As I was working on the presentation the past few days, I struggled with a way to answer a question which I’ve been asked multiple times:

Why does it take so many steps to make a commit?