|[jgit-dev] jGit memory management and optimizations|
I'm just starting to look at jGit but from my small tests, it is extremely liberal with RAM.
The only advanced guide I could find about this mentions very few tricks: http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.egit.doc%2Fhelp%2FJGit%2FUser_Guide%2FAdvanced-Topics.html
I haven't yet analyzed the source code itself very much but I'll start with a simple questions: how does one efficiently count the commits?
RevWalkUtils.count(...) calls find(walk, start, end).size() which basically builds a huge ArrayList with all the commits. Counting by hand is better, but not by much as, it seems to consume lots of RAM even so (via the RevWalk itself, I assume).
What am I missing?
I'm starting to believe that perhaps I should read more about the Git files format (http://git-scm.com/book/en/Git-Internals-Packfiles ?) and parse that somehow directly -- at least for the whole repository, counting should be fast.
It there something inherent in the git design that makes this so RAM hungry? I realize we are doing a topological sort on a DAG, but this seems to be a rather particular kind of DAG (generally, each vertex has only one incoming/outgoing edge) and I somehow expected operations on it to be much more efficient in terms of both memory and time.
Any low-hanging fruit remaining? Perhaps some ideas about building some 'index' to speed up jgit operations?