[Python-Dev] Dictionary tuning
Tim Peters
tim.one@comcast.net
Mon, 28 Apr 2003 21:49:52 -0400
[Tim Delaney]
>> What is the effect on peak memory usage over "average" programs?
[Raymond Hettinger]
> Since the amortized growth is the same, the effect is Nil on average.
> Special cases can be contrived with a specific number of records
> where the memory use is doubled, but in general it is nearly unchanged
> for average programs.
That doesn't make sense. Dicts can be larger after the patch, but never
smaller, so there's nothing opposing the "can be larger" part: on average,
allocated address space must be strictly larger than before. Whether that
*matters* on average to the average user is something we can answer
rigorously just as soon as we find an average user with an average program
<wink>. I'm not inclined to worry much about it.
>> This might be a worthwhile speedup on small dicts (up to a TBD
>> number of entries) but not worthwhile for large dicts.
> Actually, it helps large dictionaries even more that small dictionaries.
> Collisions in large dicts are resolved through other memory probes
> which are almost certain not to be in the current cache line.
That part makes sense. Resizing a large dict is an expensive operation too.
>> However, to add this capability in would of course add more code
>> to a very common code path (additional test for current size to
>> decide what factor to increase by).
> Your intuition is exactly correct. All experiments to special case
> various sizes results in decreased performance because it added
> a tiny amount to some of the most heavily exercised code in
> python.
This part isn't clear: the changed code is in the body of an if() block
that normally *isn't* entered (over an ever-growing dict's life, it's
entered O(log(len(dict))) times, and independent of the number of dict
lookups). The change cuts the number of times it's entered by approximately
a factor of 2, but it isn't entered often even now.
> Further, it results in an unpredicable branch which is
> also not a good thing.
Since the body of the loop isn't entered often, unpredictable one-shot
branches within the body shouldn't have a measurable effect. The
unpredictable branches when physically resizing the dict will swamp them
regardless. The surrounding if-test continues to be predictable in the
"branch taken" direction.
What could be much worse is that stuffing code into the if-block bloats the
code so much as to frustrate lookahead I-stream caching of the normal
"branch taken and return 0" path:
if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
if (dictresize(mp, mp->ma_used*2) != 0)
return -1;
}
return 0;
Rewriting as
if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2)
return 0;
return dictresize(mp, mp->ma_used*2) ? -1 : 0;
would help some compilers generate better code for the expected path, and
especially if the blob after "return 0;" got hairier.
IOW, if fiddling with different growth factors at different sizes slowed
things down, we have to look for something that affected the *normal* paths;
it's hard to imagine that the the guts of the if-block execute often enough
to matter (discounting its call to dictresize(), which is an expensive
routine).
> I found out that timing dict performance was hard.
> Capturing memory usage was harder. Checking entry
> space,space plus unused space, calls to PyMalloc, and
> calls to the os malloc, only the last is important, but
> it depends on all kinds of things that are not easily
> controlled.
In my early Cray days, the Cray boxes were batch one-job-at-a-time, and all
memory was real. If you had a CPU-bound program, it took the same number of
nanoseconds each time you ran it. Benchmarking was hard then too <0.5
wink>.