25 Jul 13:10 2011

## report on a succesful Cython project using threads

```Hello, I would like to report on a successful Cython project. Successful in the
sense that it was much faster than all code written by my predecessors mainly
because the speed scales almost linearly with the number of cores. Also, the
code is shorter and much easier to read and maintain.
The task is to compute the elemements of a 3D volume (Numpy array) multiple
times. The problem is "embarrassingly parallel", so the number of threads is
naturally set to the number of cores and the 3D volume is split up in sets of
slabs, so on a dual core that set would comprise half of all the slabs.

First, a primary quantity is computed for every element in the 3D volume, this
is straightforward algebra and easy to code in Cython "with nogil".
Next a convolution is done, I have used pyfftw:

<at> classmethod
def ScatterContribution(cls, cnp.ndarray[cnp.float32_t,ndim=3] scatter,
cnp.ndarray[cnp.float32_t,ndim=3] total, int zlow, int zhigh):
cdef Py_ssize_t z

for z from zlow<= z <zhigh:
cls.inputa[0:cls.xdim,0:ydim]=scatter[z,:,:]
cls.fft()
cls.inputb[:,:]=cls.outputa*cls.fft_midscatterkernel
cls.ifft()
# fftw does not do proper normalisation.
scatter[z,:,:]=cls.outputb[cls.fslice]/cls.normalisation
total[z,:,:]+=scatter[z,:,:]

This cannot be done "with nogil" but that does not matter, because the GIL is
released automatically for Numpy operations.

The last task is to interpolate total, the 3D volume, to a reference frame.
Normally one would use scipy.ndimage.interpolation.map_coordinates to do so, but
that cannot be run inside "with nogil". I have heard about attempts to make some
scipy numerical routines multithreaded, but that is actually not what I needed,
because a sequence of modules that each have their own separate threads will run
much slower than threads defined outside the volume.
Since simple bilinear interpolation suffices, I decided to code it myself, so
that it could be run "with nogil". I guess some Cython repository with blocks of
numerical code that can be copy&pasted after a "with nogil" indent would have