Robert Bradshaw | 27 Dec 06:02 2009

Re: converting code from pyrex to cython

On Dec 23, 2009, at 4:12 PM, Marko Loparic wrote:

> Hi,
>
> Could someone please give me some advice on cython usage?
>
> (I feel ashamed for asking questions that are probably not too far
> from me somewhere in a "fine manual", but I have a strict deadline so
> if there is someone willing to give me some hints to accelerate the
> process it would be of great help!)
>
> I did some pyrex code about 3 years ago and now I want to rewrite this
> code in cython and add some new stuff.

Hopefully you'll just be able to compile it as is, no need to rewrite.

> I don't know much about cython
> yet. My priority is to get a fast code, everything on the C level,
> with the exception of some raise commands. The page
> http://wiki.cython.org/tutorials/numpy points me to the -a option
> providing the html (a wonderful tool to understand what is going on!).
> Here are my questions:
>
> 1. For every numpy array, my pyrex code checked the type and the size.
> I see I can ask cython to do this task for me (directly in the
> function argument list). Do you know how I can check the type of an
> array of strings? Its dtype is for instance '|S14'  and 14 is also
> passed and an int argument to the function. I could do this in the
> Python level
>
> if name_array.dtype != '|S%d' % size:
>    raise ...
>
> but I would like to know how to do it in the C level.

Not pretty, but you could do an if statement on the size, and hard  
code the constants. In general, if the type size is dynamic you can't  
(yet, as far as I know) statically compile it in.

> 2. In general I would like to be able to find the type of the C
> variables that appear on the C code generated by cython. In pyrex this
> was easy, because there was no support for numpy and so we had to have
> in our own code the definitions of the structures we used (defintions
> that we copied from another code on the net). So it sufficed to search
> for the variable in these definitions to see its type. I believe that
> using cython the solution to my problem is to look for the include
> files cython uses. I've found numpy.pyx somewhere in my ubuntu system,
> but it mentions dtype in a cython syntax which is for me still distant
> from C, so I still couldn't guess which type dtype has in C. Looking
> at the C code generated by cython also didn't help. Do you have a
> suggestion for me? My concern is not specifically for dtype but in
> general to the variables that cython offers to me and that I would
> like to be able to manipulate in C (by "manipulate in C" I mean of
> course in cython using C-only constructs, ie. code lines that get
> appear in white in the html).

You can still declare them yourself if you want, or see Cython/Include/ 
numpy.pxd to see what happens when you cimport numpy.

> 3. My main hesitation is actually between writing code the way I did
> in pyrex (C manipulation) and doing in the cython way (higher level
> syntax). The best would be to do in the cython way, so to have a
> cleaner and more robust code. Do you have an advice to give in that
> sense? Are there things that I should rather do in the C level than to
> use cython for some reason (performance etc)?

This is a rather vague question, so I'm not really sure I have an  
answer for you. At least WRT indexing into numpy arrays, it's a lot  
easier and cleaner now.

> 4. I would like to manipulate the numpy arrays using pointer
> arithmetic. In my code there are array coordinates that I reuse
> thousands of times, so in pyrex I translated the coordinates into
> pointers to get more speed.

I'm curious, did this really help? (Did you benchmark?)

> If written in python my preprocessing code
> would look like this:
>
> for i, coords in enumerate(coordinate_list):  # len(coords) ==  
> array.ndim
>    pointer = coordinate_to_pointer(array, coords)
>    array_of_pointers[i] = pointer
>
> I had my own buggy coordinate_to_pointer routine written in pyrex (it
> assumed the arrays were C contiguous without knowing that this could
> not be the case...). Should I remake a new routine in cython or should
> I use something cython provides me? In particular: is there
> coordinate_to_pointer function in cython or in numpy?

If you use the buffer syntax to index into arrays, it should do that  
all for you. Nothing prevents you from rolling your own (or using the  
one you already wrote for that matter). I'd time it to see whether or  
not it makes a difference.

> 5. Concerning pointers, how can a save them is a numpy array? In other
> words, how do I define in cython the numpy array array_of_pointers? In
> pyrex I made it in C-level, defining cdef void** array_of_pointers and
> using mallocs, but I believe that to using a numpy array preferable.

I don't know if there's an integer dtype that's guaranteed to be large  
enough to hold a void*, but there may be--ask on the numpy list.

> 6. Still on pointer arithmetics, what is the cython valid/recommended
> way to do that? I remember that in pyrex there was this syntactic
> constraint of adding [0] for all pointer usage. It worked well. Is
> this the case for cython as well? I have googled "cython pointer" but
> I couldn't find good examples of pointer arithmetic code in cython. Do
> you have a suggestion on that issue (either on where to find these
> examples or on replacing pointer arithmetics by something more
> cythonic)?

Just as you would in C, except that the unary * operator doesn't  
dereference (it already has too many other meanings in Python) so you  
have to do [0] instead.

Despite you're strict deadline, it might be worth spending an hour or  
two reading http://docs.cython.org/src/tutorial/numpy.html and
http://conference.scipy.org/proceedings/SciPy2009/paper_2/ 
  . If you don't even have time for thatY, you could also just compile  
your Pyrex code as is, do everything the old manual way, and worry  
about cleaning it up later when you have more time.

- Robert


Gmane