Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Wes McKinney <wesmckinn-Re5JQEeQqe8AvxtiuMwx3w <at> public.gmane.org>
Subject: Re: [Cython] Wacky idea: proper macros
Newsgroups: gmane.comp.python.cython.devel
Date: Monday 30th April 2012 20:56:13 UTC (over 4 years ago)
On Mon, Apr 30, 2012 at 4:49 PM, Dag Sverre Seljebotn
 wrote:
> On 04/30/2012 06:30 PM, Wes McKinney wrote:
>>
>> On Mon, Apr 30, 2012 at 11:19 AM, mark florisson
>>   wrote:
>>>
>>> On 30 April 2012 14:49, Wes McKinney<[email protected]>  wrote:
>>>>
>>>> On Sun, Apr 29, 2012 at 5:56 AM, mark florisson
>>>>   wrote:
>>>>>
>>>>> On 29 April 2012 08:42, Nathaniel Smith  wrote:
>>>>>>
>>>>>> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson
>>>>>>   wrote:
>>>>>>>
>>>>>>> On 28 April 2012 22:04, Nathaniel Smith  wrote:
>>>>>>>>
>>>>>>>> Was chatting with Wes today about the usual problem many of us
have
>>>>>>>> encountered with needing to use some sort of templating system to
>>>>>>>> generate code handling multiple types, operations, etc., and a
wacky
>>>>>>>> idea occurred to me. So I thought I'd through it out here.
>>>>>>>>
>>>>>>>> What if we added a simple macro facility to Cython, that worked at
>>>>>>>> the
>>>>>>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style
>>>>>>>> macros.)
>>>>>>>> Basically some way to write arbitrary Python code into a .pyx file
>>>>>>>> that gets executed at compile time and can transform the AST, plus
>>>>>>>> some nice convenience APIs for simple transformations.
>>>>>>>>
>>>>>>>> E.g., if we steal the illegal token sequence @@ as our marker, we
>>>>>>>> could have something like:
>>>>>>>>
>>>>>>>> @@ # alone on a line, starts a block of Python code
>>>>>>>> from Cython.MacroUtil import replace_ctype
>>>>>>>> def expand_types(placeholder, typelist):
>>>>>>>>  def my_decorator(function_name, ast):
>>>>>>>>    functions = {}
>>>>>>>>    for typename in typelist:
>>>>>>>>      new_name = "%s_%s" % (function_name, typename)
>>>>>>>>      functions[name] = replace_ctype(ast, placeholder,
typename)
>>>>>>>>    return functions
>>>>>>>>  return function_decorator
>>>>>>>> @@ # this token sequence cannot occur in Python, so it's a safe
>>>>>>>> end-marker
>>>>>>>>
>>>>>>>> # Compile-time function decorator
>>>>>>>> # Results in two cdef functions named sum_double and sum_int
>>>>>>>> @@expand_types("T", ["double", "int"])
>>>>>>>> cdef T sum(np.ndarray[T] arr):
>>>>>>>>  cdef T start = 0;
>>>>>>>>  for i in range(arr.size):
>>>>>>>>    start += arr[i]
>>>>>>>>  return start
>>>>>>>>
>>>>>>>> I don't know if this is a good idea, but it seems like it'd be
very
>>>>>>>> easy to do on the Cython side, fairly clean, and be dramatically
>>>>>>>> less
>>>>>>>> horrible than all the ad-hoc templating stuff people do now.
>>>>>>>> Presumably there'd be strict limits on how much backwards
>>>>>>>> compatibility we'd be willing to guarantee for code that went
poking
>>>>>>>> around in the AST by hand, but a small handful of functions like
my
>>>>>>>> notional "replace_ctype" would go a long way, and wouldn't impose
>>>>>>>> much
>>>>>>>> of a compatibility burden.
>>>>>>>>
>>>>>>>> -- Nathaniel
>>>>>>>> _______________________________________________
>>>>>>>> cython-devel mailing list
>>>>>>>> [email protected]
>>>>>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>>>>>
>>>>>>>
>>>>>>> Have you looked at
>>>>>>> http://wiki.cython.org/enhancements/metaprogramming
?
>>>>>>>
>>>>>>> In general I would like better meta-programming support, maybe even
>>>>>>> allow defining new operators (although I'm not sure any of it is
very
>>>>>>> pythonic), but for templates I think fused types should be used, or
>>>>>>> improved when they fall short. Maybe a plugin system could also
help
>>>>>>> people.
>>>>>>
>>>>>>
>>>>>> I hadn't seen that, no -- thanks for the link.
>>>>>>
>>>>>> I have to say that the examples in that link, though, give me the
>>>>>> impression of a cool solution looking for a problem. I've never
wished
>>>>>> I could symbolically differentiate Python expressions at compile
time,
>>>>>> or create a mutant Python+SQL hybrid language. Actually I guess I've
>>>>>> only missed define-syntax once in maybe 10 years of hacking in
>>>>>> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R
it
>>>>>> will peek at the caller's syntax tree to automagically label the
axes
>>>>>> as "x" and "log(y)", and that can't be done in Python. But that's
not
>>>>>> exactly a convincing argument for a macro system.
>>>>>>
>>>>>> But generating optimized code is Cython's whole selling point, and
>>>>>> people really are doing klugey tricks with string-based
preprocessors
>>>>>> just to generate multiple copies of loops in Cython and C.
>>>>>>
>>>>>> Also, fused types are great, but: (1) IIUC you can't actually do
>>>>>> ndarray[fused_type] yet, which speaks to the feature's complexity,
and
>>>>>
>>>>>
>>>>> What? Yes you can do that.
>>>>
>>>>
>>>> I haven't been able to get ndarray[fused_t] to work as we've discussed
>>>> off-list. In your own words "Unfortunately, the automatic buffer
>>>> dispatch didn't make it into 0.16, so you need to manually
>>>> specialize". I'm a bit hamstrung by other users needing to be able to
>>>> compile pandas using the latest released Cython.
>>>
>>>
>>> Well, as I said, it does work, but you need to tell Cython which type
>>> you meant. If you don't want to do that, you have to use this branch:
>>> https://github.com/markflorisson88/cython/tree/_fused_dispatch_rebased
>>> . This never made it in since we had no consensus on whether to allow
>>> the compiler to bootstrap itself and because of possible immaturity of
>>> the branch.
>>>
>>> So what doesn't work is automatic dispatch for Python functions (def
>>> functions and the object version of a cpdef function). They don't
>>> automatically select the right specialization for buffer arguments.
>>> Anything else should work, otherwise it's a bug.
>>>
>>> Note also that figuring out which specialization to call dynamically
>>> (i.e. not from Cython space at compile time, but from Python space at
>>> runtime) has non-trivial overhead on top of just argument unpacking.
>>> But you can't say "doesn't work" without giving a concrete example of
>>> what doesn't work besides automatic dispatch, and how it fails.
>>>
>>
>> Sorry, I meant automatic dispatch re "doesn't work" and want to
>> reiterate how much I appreciate the work you're doing. To give some
>> context, my code is riddled with stuff like this:
>>
>> lib.inner_join_indexer_float64
>> lib.inner_join_indexer_int32
>> lib.inner_join_indexer_int64
>> lib.inner_join_indexer_object
>>
>> where the only difference between these functions is the type of the
>> buffer in the two arrays passed in. I have a template string for these
>> functions that looks like this:
>>
>> inner_join_template = """@cython.wraparound(False)
>> @cython.boundscheck(False)
>> def inner_join_indexer_%(name)s(ndarray[%(c_type)s] left,
>>                               ndarray[%(c_type)s] right):
>>     '''
>> ...
>>
>> I would _love_ to replace this with fused types.
>>
>> In any case, lately I've been sort of yearning for the kinds of things
>> you can do with an APL-variant like J. Like here's a groupby in J:
>>
>>    labels
>> 1 1 2 2 2 3 1
>>    data
>> 3 4 5.5 6 7.5 _2 8.3
>>    labels> ┌───────┬─────────┬──┐
>> │3 4 8.3│5.5 6 7.5│_2│
>> └───────┴─────────┴──┘
>>
>> Here<  is box and /. is categorize.
>>
>> Replacing the box<  operator with +/ (sum), I get the group sums:
>>
>>    labels +/ /. data
>> 15.3 19 _2
>>
>> Have 2-dimensional data?
>>
>>    data
>>  0  1  2  3  4  5  6
>>  7  8  9 10 11 12 13
>> 14 15 16 17 18 19 20
>> 21 22 23 24 25 26 27
>> 28 29 30 31 32 33 34
>> 35 36 37 38 39 40 41
>> 42 43 44 45 46 47 48
>>    labels> ┌────────┬────────┬──┐
>> │0 1 6   │2 3 4   │5 │
>> ├────────┼────────┼──┤
>> │7 8 13  │9 10 11 │12│
>> ├────────┼────────┼──┤
>> │14 15 20│16 17 18│19│
>> ├────────┼────────┼──┤
>> │21 22 27│23 24 25│26│
>> ├────────┼────────┼──┤
>> │28 29 34│30 31 32│33│
>> ├────────┼────────┼──┤
>> │35 36 41│37 38 39│40│
>> ├────────┼────────┼──┤
>> │42 43 48│44 45 46│47│
>> └────────┴────────┴──┘
>>
>>    labels +//."1 data
>>   7   9  5
>>  28  30 12
>>  49  51 19
>>  70  72 26
>>  91  93 33
>> 112 114 40
>> 133 135 47
>>
>> However, J and other APLs are interpreted. If you generate C or
>> JIT-compile I think you can do really well performance wise and have
>> very expressive code for writing data algorithms without all this
>> boilerplate.
>
>
> I know how you feel. On one hand I really like metaprogramming; on the
other
> hand I think it is very difficult to get right when done compile-time
(just
> look at C++ -- I've heard that D is a bit better though I didn't really
try
> it).
>
> JIT is really the way to go. It is one thing that a JIT could optimize
the
> case where you pass a callback to a function and inline it run-time. But
> even if it doesn't get that fancy, it'd be great to just be able to write
> something like "cython.eval(s)" and have that be compiled (I guess you
could
> do that now, but the sheer overhead of the C compiler and all the .so
files
> involved means nobody would sanely use that as the main way of stringing
> together something like pandas).
>
> I think that's really the way to get "Pythonic" metaprogramming, where
you
> mix and match runtime and compile-time, and can hook into arbitrary
Python
> code in the meta-programming step.
>
> Guido may even accept some syntax hooks to more easily express macros
> without resorting to strings, if I got the right impression at PyData on
his
> take on DSLs.
>
> Without a JIT, all we seem to come up with will be kludges. So I'm
sceptical
> about more metaprogramming features in Cython now since it will be
outdated
> in five years -- by then, somebody will either have gotten our stuff
> together and JIT-ed both Python and Cython, or we'll all be using
something
> else (like R or Julia).
>
> Dag
>
> _______________________________________________
> cython-devel mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/cython-devel

I feel pretty strongly that we need a JIT, but it really needs to run
inside CPython. The PyPy approach doesn't seem right to me, and Julia
is nice but you're essentially starting from scratch (I am going to do
some benchmarking / explorations to see how good Julia's JIT is).

I don't have the JIT-fu to do this myself, and I probably won't have
the bandwidth to work on it for a couple of years. I might be able to
fund its development sooner rather than later, though.

- Wes
_______________________________________________
cython-devel mailing list
[email protected]
http://mail.python.org/mailman/listinfo/cython-devel
 
CD: 3ms