Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Improving ObjectIdSubclassMap performance

On Wed, Mar 9, 2011 at 09:59, Aliaksandr Radzivanovich
<aradzivanovich@xxxxxxxxx> wrote:
> I took this as an opportunity to practice in implementing Cuckoo
> hashing[1].
...
> My Cuckoo hash tables work slower than the original map. The one that
> uses four hash functions inserts objects significantly slower. Both
> maps retrieve objects significantly slower, which is very surprising
> for me. My guess it is because of randomized access to memory. The
> good news, however, is that these new maps waste less space --  1,4
> array elements per each object versus 2,8 of the original map.

This is an interesting result, thanks for sharing it

Unfortunately the slowdown is too great over the current code to
justify switching in the name of saving memory. A table for 1,500,000
unique objects is non-trivial in size. If we're worried about the
memory footprint of that table, we should also be looking at a table
10x larger to account for what the linux-2.6 repository will look like
in <10 years from now. Unless RAM keeps growing to keep pace and the
Java GC can handle the larger heap sizes, we may have to look at
segmenting the table and paging portions of it out to disk. Which will
be a radically different algorithm than Cuckoo hashing, or the current
implementation. :-|

Unfortunately SHA-1 is giving us uniformly distributed keys. We cannot
really make any optimizations about stuffing some subset of previously
discovered keys into a more compact data structure and making that
read-only, then piling in the newly discovered keys into a more
dynamic structure.

-- 
Shawn.


Back to the top