19 Apr 13:20 2012

## Re: [Cython] New early-binding concept [was: CEP1000]

2012/4/19 Dag Sverre Seljebotn <d.s.seljebotn@...>: > On 04/19/2012 10:35 AM, Vitja Makarov wrote: >> >> 2012/4/19 Dag Sverre Seljebotn<d.s.seljebotn@...>: >>> >>> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>>> >>>> >>>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>>> >>>>> >>>>> from numpy import sqrt, sin >>>>> >>>>> cdef double f(double x): >>>>> return sqrt(x * x) # or sin(x * x) >>>>> >>>>> Of course, here one could get the pointer in the module at import time. >>>> >>>> >>>> >>>> That optimisation would actually be very worthwhile all by itself. I >>>> mean, >>>> we know what signatures we need for globally imported functions >>>> throughout >>>> the module, so we can reduce the call to a single jump through a >>>> function >>>> pointer (although likely with a preceding NULL check, which the branch >>>> prediction would be happy to give us for free). At least as long as sqrt >>>> is >>>> not being reassigned, but that should hit the 99% case. >>>> >>>> >>>>> However, here: >>>>> >>>>> from numpy import sqrt >>> >>> >>> >>> Correction: "import numpy as np" >>> >>>>> >>>>> cdef double f(double x): >>>>> return np.sqrt(x * x) # or np.sin(x * x) >>>>> >>>>> the __getattr__ on np sure is larger than any effect we discuss. >>>> >>>> >>>> >>>> Yes, that would have to stay a .pxd case, I guess. >>> >>> >>> >>> How about this mini-CEP: >>> >>> Modules are allowed to specify __nomonkey__ (or __const__, or >>> __notreassigned__), a list of strings naming module-level variables where >>> "we don't hold you responsible if you assume no monkey-patching of >>> these". >>> >>> When doing "import numpy as np", then (assuming "np" is never reassigned >>> in >>> the module), at import time we check all names looked up from it in >>> __nomonkey__, and if so treat them as "from numpy import sqrt as >>> 'np.sqrt'", >>> i.e. the "np." is just a namespace mechanism. >>> >>> Needs a bit more work, it ignores the possibility that others could >>> monkey-patch "np" in the Cython module. >>> >>> Problem with .pxd is that currently you need to pick one overload >>> (np.sqrt >>> works for n-dimensional arrays too, or takes a list and returns an >>> array). >>> And even after adding 3-4 language features to Cython to make this work, >>> you're stuck with having to reimplement parts of NumPy in the pxd files >>> just >>> so that you can early bind from Cython. >>> >> >> Sorry, I'm a bit late. >> >> When should __nomonkey__ be checked at compile time or at import time? > > > At import time. At compile time we generate one (potential) function pointer > per call-signature we might try. At import time we fill them in if they are > in __nomonkey__ using CEP 1000. At call time we likely() around the pointer > being non-empty, since the cost of a dict-lookup is so large anyway. > > >> It seems to me that compiler must guess function signature at compile >> time. And then check it at runtime. > > > Yes. Just like Fortran 77, where you don't declare functions, just call > them.At least with Cython it'll just go slower if you get them wrong, we > won't get a crash > > If you want to help the compiler along explicitly, you would instead do > something like > > cdef double (*sqrt_double)(double) > from numpy import sqrt > > >> What if integer signature is guessed for sqrt() based on the argument >> type sqrt(16) should this call fallback to PyObject_Call() or cast an >> integer to a double at some point? > > > a) np.sqrt could export functions for all basic types (this is how NumPy > currently works under the hood anyway) > > b) It doesn't help here, but I also imagine Cython doing a 3-step or 4-step > call down the line: > > - Direct call using the types given. > - Promote all scalars to 64 bit, try again. > [- (Optional if an FFI library or LLVM is available): Parse signature string > of function and build call dynamically using a FFI library)] > - Python call > > Without an FFI library, I think giving the user a speedup if he/she writes > sqrt(3.) rather than sqrt(3) is fine... > > c) An optimize-for-host-libraries compilation flag could of course just > probe for the signatures, similar to profile-guided optimization. > Ok, np.sqrt() supports different signatures how would cython know which C-function to use? > >> I've tried to implement trivial approach for CyFunction. Trivial means >> that function accepts PyObjects as arguments and returns an PyObject, >> so trivial signature is only one integer: 1 + len(args). If signature >> match occurs dirct C-function is called and PyObject_Call() is used >> otherwise. I didn't succeed because of argument cloning problems, we >> discussed before. >> >> About dict lookups: it's possible to speedup dict lookup by a constant >> key if we have access to dict's internal implementation. I've >> implemented it for module-level lookups here: >> >> >> https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9 >> >> And it gave 4 times speed up for dummy test: >> >> def foo(): >> cdef int i, r = 0 >> o = foo >> for i in range(10000000): >> if o is foo: >> r += 1 >> >> %timeit foo() >> 1 loops, best of 3: 229 ms per loop >> >> %timeit foo_optimized() >> 10 loops, best of 3: 54.1 ms per loop >> > > Cool! Am I right that that translates to 5.4 ns? That's pretty good. (What > CPU did you use?) > Yes, the test was run on Intel I3 3.2Ghz > Still, a function pointer call done at import time appears to be roughly 1 > ns, so a full sqrt bound the way I proposed would be 2-3 ns, so 5.4ns in > addition still relatively large. > > But, it does mean that __nomonkey__, if not completely invalid, is perhaps > not exactly high-priority. For a JIT that would consult it at compile time > the gain would be higher though. > > -- -- vitja.