Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] Improving performance of looking up a path within a tree (TreeWalk.forPath)

Hi!

I'm using jgit a an embedded, versioned key-value store to manage
content. I must say it works great in such a setup, allows me to keep
the whole history of all edits, go back to any previous version, work
on multiple versions in parallel etc. - sweat. Thank you for all the
hard work on this library.

Since I've got the whole thing working I've started to look into
performance of my primary scenario: multiple reads from one tree.
Basically what happens is that at any given point in time there is one
'active' version (a commit) so I know a tree. Given a tree I want to
look up paths (that is - for a given path find a BLOB's sha1 that
corresponds to that path). In short it is like this: I'm getting a
reference to a tree and I want to look up multiple paths (I'm
interested in BLOBs only) in this tree (and its sub-trees).

Looking at the performance of paths look-ups (I'm using
TreeWalk.forPath) it turns out that the vast majority of time is spent
reading and decompressing tree objects. In fact the same tree is being
red multiple times. I've noticed that there is a cache for object's
sha1 (CachedObjectDirectory) but not for the content itself. What I
would like to do is to cache content of trees.

What  I did as the very first try is that I've created a subclass of
the ObjectReader which caches all the ObjectLoaders for a given sha1
of type 2 (tree). This greatly improves the read performance but I
wonder if this approach is not too naive.

What I would like to ask is this: what would be the best approach to
improve the performance of paths look-ups given a tree. For me it
should involve caching uncompresed trees red from a disk (IO and
decompression seems to be hot spot here) but I'm not sure where is the
best place to plug such a cache? Does wrapping such cache around
ObjectReader makes sense?

I would highly appreciate your opinion.

Cheers,
Pawel Kozlowski


Back to the top