11 Oct 15:47
Re: Geometry and spatial indexes, my opinion
Brandon Kohn <blkohn <at> hotmail.com>
2008-10-11 13:47:25 GMT
2008-10-11 13:47:25 GMT
-------------------------------------------------- From: "Patrick Mihelich" <patrick.mihelich <at> gmail.com> Sent: Friday, October 10, 2008 10:55 PM To: <boost <at> lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion > On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn <at> hotmail.com> wrote: > >> Second, there are efficiency and code size concerns. Consider the >> Euclidean >>> distance function. For 2D/3D points, it is nice to use compile-time >>> indexing >>> and metaprogramming to guarantee tight, loop-less code. For high >>> dimensional >>> points, however, beyond some dimension >>> (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer >>> desirable, and we would rather just do the runtime iteration. If the >>> points >>> are RuntimeIndexable, we can choose an appropriate distance algorithm at >>> compile time depending on the points' dimensionality. >>> >>> >> I presume the reason one would not want loop-less code at high dimensions >> is due to bloat? I do not understand why runtime looping would be faster >> in >> this case. It would seem that even in the largest cases the bloat would >> still be small compared to memory sizes today. Runtime indexing can >> certainly be accommodated using a tag dispatch on the access traits. I >> think >> more research needs to be done to quantify the differences between the >> two >> to see if it is worth the maintenance overhead. > > > Sure, I need to back up my claim. The fundamental problem with loop-less > code at high dimensions is bloat, yes. Remember that bloat also affects > efficiency. Suppose we're calculating distances from one point to a set of > other points. If the size of the distance function makes the inner loop > code > larger than the L1 instruction cache, we take a performance hit. > > I'm attaching a simple benchmark that calculates the distance between two > points some number of times, using both compile-time and runtime access. > Note that this is awfully friendly to the compile-time access code, as > we're > doing nothing else in the inner loop. Here are some results on my machine > (Intel Core Duo 2.4 GHz, gcc 4.2.3): > > $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o > runtime_test > $ ./runtime_test > Distance = 2.000000 > Compile-time access: 5.460000s > Runtime access: 9.070000s > $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o > runtime_test > $ ./runtime_test > Distance = 3.000000 > Compile-time access: 7.980000s > Runtime access: 10.970000s > > So for 2D/3D we see the benefit to using compile-time access, although it > wouldn't surprise me if the gap could be closed by better compile-flag > twiddling. Let's see what happens at high dimensions. Let's also add in a > partially unrolled runtime access version that operates 4 elements at a > time > (ideally the compiler would take care of this). > > $ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o > runtime_test > mihelich <at> adt:~/projects/stuff$ ./runtime_test > Distance = 36.000000 > Compile-time access: 1.620000s > Runtime access: 1.280000s > Runtime access (unrolled): 0.740000s > $ g++ -I /u/mihelich/packages/boost-spacial-index -I > /u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128 > -O3 runtime_test.cpp -o runtime_test > $ ./runtime_test > Distance = 128.000000 > Compile-time access: 5.960000s > Runtime access: 4.900000s > Runtime access (unrolled): 3.160000s > > Here runtime access is faster. 128 dimensions may sound excessive, but I > did > not choose these numbers arbitrarily. In vision applications it is very > common to represent image patches by SIFT descriptors, which are points in > 128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but > runtime access is still faster. In either case partial unrolling is a big > win. In other applications we may go to even higher dimensions... > > $ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024 > runtime_test.cpp -o runtime_test > $ ./runtime_test > Distance = 1000.000000 > Compile-time access: 8.150000s > Runtime access: 4.090000s > Runtime access (unrolled): 2.460000s > > Now runtime access is much faster. And once we get up this far I have to > crank up the template depth just to calculate a distance. 1000 dimensions > is > high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky > Survey have 4000 dimensions, for example. > > I understand if the authors do not want to concern themselves much with > optimizing for non 2D/3D cases, as users who really care can always > substitute their own highly optimized distance function using SSE, etc. > But > it would be nice to have something decent out of the box, and there is > plenty of reason for a RuntimeIndexable concept. If the geometry library > is > to be used as a base for spatial index and perhaps clustering libraries, > some consideration has to be taken for high dimensions. > Thank you for running some numbers Patrick, this certainly helps to clarify the issues. I'm going to investigate using Fusion as the underlying mechanism for access as I hear it supports both models. Out of curiosity, I wonder are these random accesses generally done iteratively? If so, how does direct iteration compare with index operator accesses? I've noticed in my own tests on measuring std::vector accesses that iteration is significantly faster than operator [ ] when you are iterating over the container (presumably due to bounds checking). Cheers, Brandon > Cheers, > Patrick > > _______________________________________________ > Unsubscribe & other changes: > http://lists.boost.org/mailman/listinfo.cgi/boost _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
RSS Feed