Python vs C for performance

Published: September 22, 2009
Tags: performance programming python C

Via Reddit today I came across a fairly decent article on Python optimization tips and issues, which comes across fairly heavily in favour of the idea that by being careful and knowing what you're doing, you can typically make Python implementations of numerical algorithms fast enough for practical purposes, saving you the massive headaches associated with development in C.

This is a fairly relevant topic for me. A huge part of my PhD research is writing and playing with computational models of language acquisition. These models usually use Bayesian inference as a model of an "ideal learner" (doing this is fairly trendy in modern computational cognitive science, and not without good reason). When it comes to doing numerical Bayesian inference, a class of techniques known as Markov Chain Monte Carlo set the standard, and MCMC computations are the bread and butter of my programming work. Without going into too much detail, MCMC computations are highly iterative - you basically just do the same few steps over and over as fast as you can until your program converges on your answer.

When I first started writing MCMC models for my research, I did them in Python. I knew that my C skills weren't great, that I'd be able to program a lot faster in Python, and that there were enough resources on the net for high performance Python computing that I'd be able to get my programs fast enough. But after investing a lot of time trying to get my first Python MCMC program running fast - using things like numpy's arrays and psyco - I still wasn't happy with how slow things were going. I knuckled down and rewrote my model in C, and it absolutely blew the pants off my Python version. Of course I expected it to be faster, but it was much faster than I ever expected it to be. From then on in, I've written all of my MCMC stuff in C. Since then I've become aware of PyMC, a Python library geared specifically toward MCMC. So obviously someone out there is able to make Python work for MCMC at a decent speed. This makes me wonder if maybe I really was doing something extremely wrong the whole time I was using Python for numerical work. Maybe it's time I really dedicate some time to learning how to make Python faster for this kind of thing, to save myself time and effort in the future.

Even if it turns out that suffering through C implementations for the past 18 months has been something of a waste of time, to be honest I wouldn't really regret it. Relying on C for my research has taught me more about it than I'd ever have learned otherwise. I am now comfortable with using malloc, calloc and free, I've done some multithreading work with the pthread API and I've learned how to use gdb and gprof. I feel like I actually deserve to be able to say that I "know C", as opposed to before when I knew the syntax but couldn't really work effectively in C. Of course, I am still a long way from being any sort of guru, but I feel like I've taken important steps. I'll be much more confident the next time I have to read and understand some C code. A lot of people probably feel that in this day and age of Python and Ruby and Lua (and in less "webby" parts of the world, Java and C#) that competence in C is an obsolete skill. I think there's a degree of truth in that, and certainly C shouldn't be used for new projects without a compelling argument in favour of it (and a simple "it's faster", is not a compelling argument!). But at the same time I think that mastery of C is still an important rite of passage for serious programmers, and it feels good to have taken a number of extra steps in that direction.

But if I can work in Python from now on, I'm not going to miss the segaults.

Feeds