Dag Sverre Seljebotn | 16 May 09:44 2012
Picon
Picon

C-level duck typing

Hi python-dev,

these ideas/questions comes out of the Cython and NumPy developer lists.

What we want is a way to communicate things on the C level about the 
extension type instances we pass around. The solution today is often to 
rely on PyObject_TypeCheck. For instance, hundreds of handcrafted C 
extensions rely on the internal structure of NumPy arrays, and Cython 
will check whether objects are instances of a Cython class or not.

However, this creates one-to-many situations; only one implementor of an 
object API/ABI, but many consumers. What we would like is multiple 
implementors and multiple consumers of mutually agreed-upon standards. 
We essentially want more duck typing on the C level.

A similar situation was PEP 3118. But there's many more such things one 
might want to communicate at the C level, many of which are very 
domain-specific and not suitable for a PEP at all. Also PEPs don't 
backport well to older versions of Python.

What we *think* we would like (but we want other suggestions!) is an 
arbitrarily extensible type object, without tying this into the type 
hierarchy. Say you have

typedef struct {
     unsigned long extension_id;
     void *data;
} PyTypeObjectExtensionEntry;

and then a type object can (somehow!) point to an array of these. The 
array is linearly scanned by consumers for IDs they recognize (most 
types would only have one or two entries). Cython could then get a 
reserved ID space to communicate whatever it wants, NumPy another one, 
and there could be "unofficial PEPs" where two or more projects get 
together to draft a spec for a particular type extension ID without 
having to bother python-dev about it.

And, we want this to somehow work with existing Python; we still support 
users on Python 2.4.

Options we've thought of so far:

  a) Use dicts and capsules to get information across. But 
performance-wise the dict lookup is not an option for what we want to 
use this for in Cython.

  b) Implement a metaclass which extends PyTypeObject in this way. 
However, that means a common runtime dependency for libraries that want 
to use this scheme, which is a big disadvantage to us. Today, Cython 
doesn't ship a runtime library but only creates standalone compileable C 
files, and there's no dependency from NumPy on Cython or the other way 
around.

  c) Hijack a free bit in tp_flags (22?) which we use to indicate that 
the PyTypeObject struct is immediately followed by a pointer to such an 
array.

The final approach is drafted in more detail at 
http://wiki.cython.org/enhancements/cep1001 . To us that looks very 
attractive both for the speed and for the lack of runtime dependencies, 
and it seems like it should work in existing versions of Python. But do 
please feel free to tell us we are misguided. Hijacking a flag bit 
certainly feels dirty.

Examples of how this would be used:

  - In Cython, we'd like to use this to annotate callable objects that 
happen to wrap a C function with their corresponding C function 
pointers. That way, callables that wrap a C function could be "unboxed", 
so that Cython could "cast" the Python object "scipy.special.gamma" to a 
function pointer at runtime and speed up the call with an order of 
magnitude. SciPy and Cython just needs to agree on a spec.

  - Lots of C extensions rely on using PyObject_TypeCheck (or even do an 
exact check) before calling the NumPy C API with PyArrayObject* 
arguments. This means that new features all have to go into NumPy; it is 
rather difficult to create new experimental array libraries. Extensible 
PyTypeObject would open up the way for other experimental array 
libraries; NumPy could make the standards, but others implement them 
(without getting NumPy as a runtime dependency, which is the consequence 
of subclassing). Of course, porting over the hundreds (thousands?) of 
extensions relying on the NumPy C API is a lot of work, but we can at 
least get started...

Ideas?

Dag Sverre Seljebotn

Gmane