2012/4/19 Dag Sverre Seljebotn
:
> On 04/19/2012 10:35 AM, Vitja Makarov wrote:
>>
>> 2012/4/19 Dag Sverre
Seljebotn:
>>>
>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote:
>>>>
>>>>
>>>> Dag Sverre Seljebotn, 18.04.2012 23:35:
>>>>>
>>>>>
>>>>> from numpy import sqrt, sin
>>>>>
>>>>> cdef double f(double x):
>>>>> return sqrt(x * x) # or sin(x * x)
>>>>>
>>>>> Of course, here one could get the pointer in the module at import
time.
>>>>
>>>>
>>>>
>>>> That optimisation would actually be very worthwhile all by itself. I
>>>> mean,
>>>> we know what signatures we need for globally imported functions
>>>> throughout
>>>> the module, so we can reduce the call to a single jump through a
>>>> function
>>>> pointer (although likely with a preceding NULL check, which the branch
>>>> prediction would be happy to give us for free). At least as long as
sqrt
>>>> is
>>>> not being reassigned, but that should hit the 99% case.
>>>>
>>>>
>>>>> However, here:
>>>>>
>>>>> from numpy import sqrt
>>>
>>>
>>>
>>> Correction: "import numpy as np"
>>>
>>>>>
>>>>> cdef double f(double x):
>>>>> return np.sqrt(x * x) # or np.sin(x * x)
>>>>>
>>>>> the __getattr__ on np sure is larger than any effect we discuss.
>>>>
>>>>
>>>>
>>>> Yes, that would have to stay a .pxd case, I guess.
>>>
>>>
>>>
>>> How about this miniCEP:
>>>
>>> Modules are allowed to specify __nomonkey__ (or __const__, or
>>> __notreassigned__), a list of strings naming modulelevel variables
where
>>> "we don't hold you responsible if you assume no monkeypatching of
>>> these".
>>>
>>> When doing "import numpy as np", then (assuming "np" is never
reassigned
>>> in
>>> the module), at import time we check all names looked up from it in
>>> __nomonkey__, and if so treat them as "from numpy import sqrt as
>>> 'np.sqrt'",
>>> i.e. the "np." is just a namespace mechanism.
>>>
>>> Needs a bit more work, it ignores the possibility that others could
>>> monkeypatch "np" in the Cython module.
>>>
>>> Problem with .pxd is that currently you need to pick one overload
>>> (np.sqrt
>>> works for ndimensional arrays too, or takes a list and returns an
>>> array).
>>> And even after adding 34 language features to Cython to make this
work,
>>> you're stuck with having to reimplement parts of NumPy in the pxd files
>>> just
>>> so that you can early bind from Cython.
>>>
>>
>> Sorry, I'm a bit late.
>>
>> When should __nomonkey__ be checked at compile time or at import time?
>
>
> At import time. At compile time we generate one (potential) function
pointer
> per callsignature we might try. At import time we fill them in if they
are
> in __nomonkey__ using CEP 1000. At call time we likely() around the
pointer
> being nonempty, since the cost of a dictlookup is so large anyway.
>
>
>> It seems to me that compiler must guess function signature at compile
>> time. And then check it at runtime.
>
>
> Yes. Just like Fortran 77, where you don't declare functions, just call
> them.At least with Cython it'll just go slower if you get them wrong, we
> won't get a crash :)
>
> If you want to help the compiler along explicitly, you would instead do
> something like
>
> cdef double (*sqrt_double)(double)
> from numpy import sqrt
>
>
>> What if integer signature is guessed for sqrt() based on the argument
>> type sqrt(16) should this call fallback to PyObject_Call() or cast an
>> integer to a double at some point?
>
>
> a) np.sqrt could export functions for all basic types (this is how NumPy
> currently works under the hood anyway)
>
> b) It doesn't help here, but I also imagine Cython doing a 3step or
4step
> call down the line:
>
>  Direct call using the types given.
>  Promote all scalars to 64 bit, try again.
> [ (Optional if an FFI library or LLVM is available): Parse signature
string
> of function and build call dynamically using a FFI library)]
>  Python call
>
> Without an FFI library, I think giving the user a speedup if he/she
writes
> sqrt(3.) rather than sqrt(3) is fine...
>
> c) An optimizeforhostlibraries compilation flag could of course just
> probe for the signatures, similar to profileguided optimization.
>
Ok, np.sqrt() supports different signatures how would cython know
which Cfunction to use?
>
>> I've tried to implement trivial approach for CyFunction. Trivial means
>> that function accepts PyObjects as arguments and returns an PyObject,
>> so trivial signature is only one integer: 1 + len(args). If signature
>> match occurs dirct Cfunction is called and PyObject_Call() is used
>> otherwise. I didn't succeed because of argument cloning problems, we
>> discussed before.
>>
>> About dict lookups: it's possible to speedup dict lookup by a constant
>> key if we have access to dict's internal implementation. I've
>> implemented it for modulelevel lookups here:
>>
>>
>> https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9
>>
>> And it gave 4 times speed up for dummy test:
>>
>> def foo():
>> cdef int i, r = 0
>> o = foo
>> for i in range(10000000):
>> if o is foo:
>> r += 1
>>
>> %timeit foo()
>> 1 loops, best of 3: 229 ms per loop
>>
>> %timeit foo_optimized()
>> 10 loops, best of 3: 54.1 ms per loop
>>
>
> Cool! Am I right that that translates to 5.4 ns? That's pretty good.
(What
> CPU did you use?)
>
Yes, the test was run on Intel I3 3.2Ghz
> Still, a function pointer call done at import time appears to be roughly
1
> ns, so a full sqrt bound the way I proposed would be 23 ns, so 5.4ns in
> addition still relatively large.
>
> But, it does mean that __nomonkey__, if not completely invalid, is
perhaps
> not exactly highpriority. For a JIT that would consult it at compile
time
> the gain would be higher though.
>
>

vitja.
