[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] jgit push on http perpetually running gc before push after jgit gc

On Thu, Mar 14, 2019 at 8:23 AM Thibault Kruse <tkruse@xxxxxxxxxx> wrote:

we are using jgit for push actions in a server application. We are seeing a performance issue for pushing over the network after running jgit gc.

So before runnning jgit gc, all our push operations take normal time, after running jgit gc (manually or via autogc), all jgit push operations take excessive time for this clone. It looks like the time is spent on doing a gc before every push, or some similar activity.
For the same clone though, running shell "git push" keeps having normal runtimes(git 2.21.0).

Once the local git clone has been "corrupted" by jgit gc to make pushes slow, all following jgit pushes remain slow (even when restarting the application). When running "git gc" on the shell for the clone, all following jgit pushes become fast again.

This indicates that jgit gc leaves the local repository in a state that affects jgit push to do unnecessary extra work before push, but leaves shell git push unaffected, whereas "git gc" leaves the repo in a different state that is "healthy".

The symptom that we observed:

On a fresh clone of our repository, such push commands take 3s to complete (which is ok). After a call to jgit gc, there is a sudden increase of this duration to 60s with jgit

this is a very ancient version of jgit
We also tried upgrading to jgit, and with that version, the same symptom appears, but with a worse degradation to 120s.

The logging during this time shows many lines (similar to gc) like:

ÂÂÂ Counting objects:ÂÂÂÂÂÂ 398385
ÂÂÂ Counting objects:ÂÂÂÂÂÂ 409359

A profiling using YourKit 2019.1 reveals this stack for a push command (for running with

* PushCommand.java:170 org.eclipse.jgit.transport.Transport.push(ProgressMonitor, Collection, OutputStream) 122767ms
* BasePackPushConnection.java:219 org.eclipse.jgit.transport.BasePackPushConnection.writePack(Map, ProgressMonitor) 120640ms
* BasePackPushConnection.java:356 org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(ProgressMonitor, Set, Set) 120640ms
* ...
** BitmapWalker.java:228 org.eclipse.jgit.revwalk.ObjectWalk.nextObject() 119505ms
*** ObjectWalk.java:355 <...> org.eclipse.jgit.revwalk.BitmapWalker$BitmapObjectFilter.include(ObjectWalk, AnyObjectId) 55163ms
*** ObjectWalk.java:388 <...> org.eclipse.jgit.revwalk.ObjectWalk.enterTree(RevObject) 43887ms

So it seems that on our server, the preparePack operation is taking up excessive time. At the same time, we do not observer excessive CPU usage (it stays below 5% on average, 20% max).

Some other context: the git repository we work on has ~50K commits, each commit only changes a few lines in 1 or 2 json files. Each push pushes 1 new commit to a remote repository on the network. The bare repo size is ~1.4 GB, the size of the checked out files at HEAD is ~50MB, and has roughly 10K files.

Any advice on how to proceed? Currently our workaround plan is to disable all jgit gc (and autogc), and try to use shell git for gc.

  • what's the effective git / jgit configuration you are using ? If you didn't tweak the SystemReader jgit should combine settings
    in system level git config, global config of the OS user running your application and the repository level git config per repository.
  • how large is the max. Java heap size ?
  • did you increase the jgit window cache size (core.packedGitLimit) above the (very small) default ?
  • how did you configure gc.pruneexpire and gc.prunepackexpire ?
  • what's the output of "git count-objects -vH" (number of loose and packed objects) and "git show-ref | wc -l"Â(number of refs) for each of the states you described ?
  • don't use jgit 3.4.1, it's very old and we won't fix it. And we did a lot of fixes since then, e.g. jgit used to not clean up fan-out directories used to store loose objects when repacking them into a pack file