|Re: [jgit-dev] jGit memory management and optimizations|
On Fri, Nov 16, 2012 at 9:36 AM, Emilian Bold <emilian.bold@xxxxxxxxx> wrote: > I'm just starting to look at jGit but from my small tests, it is extremely > liberal with RAM. So is the C implementation of Git. The graph algorithms and data structures don't lend themselves to low memory processing. Both implementations trade RAM in order to reduce running time. > The only advanced guide I could find about this mentions very few tricks: > http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.egit.doc%2Fhelp%2FJGit%2FUser_Guide%2FAdvanced-Topics.html > > I haven't yet analyzed the source code itself very much but I'll start with > a simple questions: how does one efficiently count the commits? You don't. :-( This is one of the things that takes RAM and CPU time. > RevWalkUtils.count(...) calls find(walk, start, end).size() which basically > builds a huge ArrayList with all the commits. OK so that is ugly that count requires making the ArrayList. > Counting by hand is better, > but not by much as, it seems to consume lots of RAM even so (via the RevWalk > itself, I assume). Yes, RevWalk must maintain a map of all commits. > What am I missing? Have you tried setRetainBody(false) ? > I'm starting to believe that perhaps I should read more about the Git files > format (http://git-scm.com/book/en/Git-Internals-Packfiles ?) and parse that > somehow directly -- at least for the whole repository, counting should be > fast. Not much faster. You may be able to save some memory, but this is an odd question to try and accelerate an answer to. If you really need this commit count fast you may be better off to cache the value on the side. Store it as of some commit and refresh the cache when you notice the HEAD is no longer at that commit by doing a RevWalk between the two points and adding the difference to the counter. > It there something inherent in the git design that makes this so RAM hungry? > I realize we are doing a topological sort on a DAG, but this seems to be a > rather particular kind of DAG (generally, each vertex has only one > incoming/outgoing edge) and I somehow expected operations on it to be much > more efficient in terms of both memory and time. Nope. :-( > Any low-hanging fruit remaining? Perhaps some ideas about building some > 'index' to speed up jgit operations? There is new work that uses compressed bitmaps to speed up counting operations during packing, which is primarily useful when JGit is used as a server. Unfortunately this doesn't generalize to all commit walking algorithms.
Back to the top