didier.06 | 25 Apr 2012 20:49
Picon
Gravatar

Initial map size and init_sizes runtime function

Hi,

I'm playing with maps these days, and I was curious about
the underlying implementation, so I had a look at the hashmap.*
files in the runtime package.

My understanding is until a given threshold (2^12), the hash table
is exponentially growing, and past this threshold the map is
converted into a multi-layer structure (with sub-tables).

Actually, I was trying to understand why hash_grow is still called
despite the hint given at map initialization.

With this code:

for n:=0; n<1000; n++ {
    s := make( map[int] int, 26000 )
    for i:=0; i<25000; i++ {
        s[i] = i
    }
}

we get:

(pprof) top50 --cum
Total: 1433 samples
       9   0.6%   0.6%     1368  95.5% main.main
       0   0.0%   0.6%     1368  95.5% runtime.main
       0   0.0%   0.6%     1368  95.5% schedunlock
      28   2.0%   2.6%     1290  90.0% runtime.mapassign1
      83   5.8%   8.4%     1194  83.3% runtime.mapassign
      47   3.3%  11.7%     1082  75.5% hash_insert
     427  29.8%  41.5%      992  69.2% hash_insert_internal
      44   3.1%  44.5%      504  35.2% hash_grow
      34   2.4%  46.9%      420  29.3% hash_subtable_new
       2   0.1%  47.0%      371  25.9% runtime.mal
      48   3.3%  50.4%      370  25.8% runtime.mallocgc
       0   0.0%  50.4%      203  14.2% runtime.gc
 
hash_grow accounts for 35% of the time while
I would have expected it not to be called at all
here.

I find the following function of hashmap.c puzzling:

static void
init_sizes (int64 hint, int32 *init_power, int32 *max_power)
{
        int32 log = 0;
        int32 i;

        for (i = 32; i != 0; i >>= 1) {
                if ((hint >> (log + i)) != 0) {
                        log += i;
                }
        } /* round up for utilization */
        log += 1 + (((hint << 3) >> log) >= 11);  
        if (log <= 14) {
                *init_power = log;
        } else {
                *init_power = 12;
        }
        *max_power = 12;
}

It calculates the power of two used to allocate the initial hash table.
With the final test (log <= 14), it is possible to have
 init_power > max_power
(when hint is between 2816 and 11264, but not after).

What is the rationale behind it?
Thanks.

Best regards,
Didier.


Gmane