Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Writing directory list method using JGit

On Thu, Aug 18, 2011 at 05:19, ilya.ivanov@xxxxxxxxxxx
<ilya.ivanov@xxxxxxxxxxx> wrote:
> Can I do something to reduce disk usage in such slow operations if they are
> executed multiple times?
> Probably reading objects from hdd is the slowest part. Ones the objects are
> read and parsed can they be cached somehow?
> Will it help retaining Repository/TreeWalk/RevWalk objects in memory?

It helps to reuse the same ObjectReader, which implies reusing the
same Repository. Internally TreeWalk and RevWalk use an ObjectReader,
so reusing one of those will reuse the ObjectReader implicitly. They
also have a constructor you can use that takes an ObjectReader if you
want more explicit control over which reader instance is being
reused... as the reader is *not* thread-safe (but the Repository is).

JGit has its own in-memory cache of data from disk. Its default is 20
MiB, but you can increase it by creating a WindowCacheConfig object,
setPackedGitLimit() to a higher value, then calling
WindowCache.reconfigure() with the config object. Ideally you would do
this before doing any other JGit calls, as the cache is per-JVM and it
reconfigures itself by basically discarding the entire cache and
creating a new one from scratch.

The caching helps, but a lot of stuff still has to be computed on the
fly. If users want to see the same data often enough, it may be
worthwhile to add your own caching to avoid calling JGit. (E.g. like
GitHub does by caching this data in their huge Redis cluster.)

-- 
Shawn.


Back to the top