Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Improving ObjectIdSubclassMap performance

Yes, this is the way I implemented it too. In fact, I just copied the
original class and than made few modifications in it.

The algorithm is really, really simple. I won't bother posting it to
Gerrit, here are just a few snippets. Four hash functions variant:


	public V get(AnyObjectId toFind) {
		V v;
		if ((v = objects[index0(toFind)]) != null
				&& AnyObjectId.equals(v, toFind))
			return v;
		if ((v = objects[index1(toFind)]) != null
				&& AnyObjectId.equals(v, toFind))
			return v;
		if ((v = objects[index2(toFind)]) != null
				&& AnyObjectId.equals(v, toFind))
			return v;
		if ((v = objects[index3(toFind)]) != null
				&& AnyObjectId.equals(v, toFind))
			return v;
		return null;
	}

	private void insert(V v) {
		for (int n = 0; n < STEPS; n++) {
			if ((v = swap(index0(v), v)) == null)
				return;
			if ((v = swap(index1(v), v)) == null)
				return;
			if ((v = swap(index2(v), v)) == null)
				return;
			if ((v = swap(index3(v), v)) == null)
				return;
		}
		grow();
		insert(v);
	}

	private int index0(AnyObjectId id) {
		int q = objects.length >> 2;
		return (id.w1 >>> 1) % q;
	}

	private int index1(AnyObjectId id) {
		int q = objects.length >> 2;
		return (id.w2 >>> 1) % q + q;
	}

	private int index2(AnyObjectId id) {
		int q = objects.length >> 2;
		return (id.w3 >>> 1) % q + q * 2;
	}

	private int index3(AnyObjectId id) {
		int q = objects.length >> 2;
		return (id.w4 >>> 1) % q + q * 3;
	}

	private V swap(int i, V value) {
		V v = objects[i];
		objects[i] = value;
		return v;
	}


That's it.

I don't know, may be these index method can be made even faster by
using clever bit twiddling hacks, but I don't think it will improve
speed significantly.

On 9 March 2011 22:56, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Wed, Mar 9, 2011 at 12:50, Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
>> My guess is that the hash calculation makes for the overhead. If that is
>> the case, you might want to use several parts of the SHA-1 as
>> different hashes (interpreting the 20-byte as 5 32-bit integers gives you
>> five hash functions for the price of one), because the SHA-1 has been
>> calculated anyway.
>
> That's what we do already. SHA-1s are already stored as 5 primitive
> ints in Java, by treating the raw 20 bytes as 5 network byte-order
> integers and converting them to primitive integer values.
>
> The current table maps that onto a slot in the hashtable with:
>
>  (id.w1 >>> 1) % obj_hash.length
>
> Which could be faster, the table length is currently always a power of
> 2. So we could rewrite this as:
>
>  id.w1 & table_mask
>
> Where table_mask is just:
>
>  table_mask = obj_hash.length - 1.
>
> :-)
>
> --
> Shawn.
>


Back to the top