25 Jul 13:10 2011

## report on a succesful Cython project using threads

Alex van Houten <sparrow2867 <at> yahoo.com>

2011-07-25 11:10:12 GMT

2011-07-25 11:10:12 GMT

Hello, I would like to report on a successful Cython project. Successful in the sense that it was much faster than all code written by my predecessors mainly because the speed scales almost linearly with the number of cores. Also, the code is shorter and much easier to read and maintain. The task is to compute the elemements of a 3D volume (Numpy array) multiple times. The problem is "embarrassingly parallel", so the number of threads is naturally set to the number of cores and the 3D volume is split up in sets of slabs, so on a dual core that set would comprise half of all the slabs. First, a primary quantity is computed for every element in the 3D volume, this is straightforward algebra and easy to code in Cython "with nogil". Next a convolution is done, I have used pyfftw: <at> classmethod def ScatterContribution(cls, cnp.ndarray[cnp.float32_t,ndim=3] scatter, cnp.ndarray[cnp.float32_t,ndim=3] total, int zlow, int zhigh): cdef Py_ssize_t z for z from zlow<= z <zhigh: cls.inputa[0:cls.xdim,0:ydim]=scatter[z,:,:] cls.fft() cls.inputb[:,:]=cls.outputa*cls.fft_midscatterkernel cls.ifft() # fftw does not do proper normalisation. scatter[z,:,:]=cls.outputb[cls.fslice]/cls.normalisation total[z,:,:]+=scatter[z,:,:] This cannot be done "with nogil" but that does not matter, because the GIL is released automatically for Numpy operations. The last task is to interpolate total, the 3D volume, to a reference frame. Normally one would use scipy.ndimage.interpolation.map_coordinates to do so, but that cannot be run inside "with nogil". I have heard about attempts to make some scipy numerical routines multithreaded, but that is actually not what I needed, because a sequence of modules that each have their own separate threads will run much slower than threads defined outside the volume. Since simple bilinear interpolation suffices, I decided to code it myself, so that it could be run "with nogil". I guess some Cython repository with blocks of numerical code that can be copy&pasted after a "with nogil" indent would have made my life even easier. Making it this fast&short&readable&maintainable would have been pretty hard without Cython. Cheers, Alex.