[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [jgit-dev] jGit memory management and optimizations
>> I'm just starting to look at jGit but from my small tests, it is extremely
>> liberal with RAM.
> So is the C implementation of Git. The graph algorithms and data
> structures don't lend themselves to low memory processing. Both
> implementations trade RAM in order to reduce running time.
What I would like is some way to trade RAM for more running time.
I can display a loading messange, but I can't ask for 400MB to display some UI. I mostly care about desktop apps: I can easily trigger a memory error on NetBeans with large repositories.
>> The only advanced guide I could find about this mentions very few tricks:
>> I haven't yet analyzed the source code itself very much but I'll start with
>> a simple questions: how does one efficiently count the commits?
> You don't. :-(
> This is one of the things that takes RAM and CPU time.
Any place I can learn more about the design that impacts this? Other than reading the source code, of course.
>> RevWalkUtils.count(...) calls find(walk, start, end).size() which basically
>> builds a huge ArrayList with all the commits.
> OK so that is ugly that count requires making the ArrayList.
Yeah, that's a minor bug.
>> Counting by hand is better,
>> but not by much as, it seems to consume lots of RAM even so (via the RevWalk
>> itself, I assume).
> Yes, RevWalk must maintain a map of all commits.
>> What am I missing?
> Have you tried setRetainBody(false) ?
Yes, this is also in the wiki but doesn't seem to help much.
>> I'm starting to believe that perhaps I should read more about the Git files
>> format (http://git-scm.com/book/en/Git-Internals-Packfiles ?) and parse that
>> somehow directly -- at least for the whole repository, counting should be
> Not much faster. You may be able to save some memory, but this is an
> odd question to try and accelerate an answer to. If you really need
> this commit count fast you may be better off to cache the value on the
> side. Store it as of some commit and refresh the cache when you notice
> the HEAD is no longer at that commit by doing a RevWalk between the
> two points and adding the difference to the counter.
Maybe it's an odd question because I'm looking at jGit for a desktop app. It's not just counting commits, it's most of the git interractiong that would need to be done within memory constraints, but where I could let the user wait some more.
>> It there something inherent in the git design that makes this so RAM hungry?
>> I realize we are doing a topological sort on a DAG, but this seems to be a
>> rather particular kind of DAG (generally, each vertex has only one
>> incoming/outgoing edge) and I somehow expected operations on it to be much
>> more efficient in terms of both memory and time.
> Nope. :-(
This is sad. I read an article about the Eclipse Memory tool using a 'dominator tree' to speed lookup on a heapdump graph. Somehow I'm hoping for something similar for jgit that would speed some operations up or allow them to support some sort of indexing.
>> Any low-hanging fruit remaining? Perhaps some ideas about building some
>> 'index' to speed up jgit operations?
> There is new work that uses compressed bitmaps to speed up counting
> operations during packing, which is primarily useful when JGit is used
> as a server. Unfortunately this doesn't generalize to all commit
> walking algorithms.
Any link for this work so I could read some more?