[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] jgit push on http perpetually running gc before push after jgit gc

Hello,

we are using jgit for push actions in a server application. We are seeing a performance issue for pushing over the network after running jgit gc.

So before runnning jgit gc, all our push operations take normal time, after running jgit gc (manually or via autogc), all jgit push operations take excessive time for this clone. It looks like the time is spent on doing a gc before every push, or some similar activity.
For the same clone though, running shell "git push" keeps having normal runtimes(git 2.21.0).

Once the local git clone has been "corrupted" by jgit gc to make pushes slow, all following jgit pushes remain slow (even when restarting the application). When running "git gc" on the shell for the clone, all following jgit pushes become fast again.

This indicates that jgit gc leaves the local repository in a state that affects jgit push to do unnecessary extra work before push, but leaves shell git push unaffected, whereas "git gc" leaves the repo in a different state that is "healthy".

The symptom that we observed:

On a fresh clone of our repository, such push commands take 3s to complete (which is ok). After a call to jgit gc, there is a sudden increase of this duration to 60s with jgit 3.4.1.201406201815-r.
We also tried upgrading to jgit 5.2.0.201812061821-r, and with that version, the same symptom appears, but with a worse degradation to 120s.

The logging during this time shows many lines (similar to gc) like:

ÂÂÂ Counting objects:ÂÂÂÂÂÂ 398385
ÂÂÂ Counting objects:ÂÂÂÂÂÂ 409359

A profiling using YourKit 2019.1 reveals this stack for a push command (for running with 5.2.0.201812061821-r):

* PushCommand.java:170 org.eclipse.jgit.transport.Transport.push(ProgressMonitor, Collection, OutputStream) 122767ms
* BasePackPushConnection.java:219 org.eclipse.jgit.transport.BasePackPushConnection.writePack(Map, ProgressMonitor) 120640ms
* BasePackPushConnection.java:356 org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(ProgressMonitor, Set, Set) 120640ms
* ...
** BitmapWalker.java:228 org.eclipse.jgit.revwalk.ObjectWalk.nextObject() 119505ms
*** ObjectWalk.java:355 <...> org.eclipse.jgit.revwalk.BitmapWalker$BitmapObjectFilter.include(ObjectWalk, AnyObjectId) 55163ms
*** ObjectWalk.java:388 <...> org.eclipse.jgit.revwalk.ObjectWalk.enterTree(RevObject) 43887ms

So it seems that on our server, the preparePack operation is taking up excessive time. At the same time, we do not observer excessive CPU usage (it stays below 5% on average, 20% max).


Some other context: the git repository we work on has ~50K commits, each commit only changes a few lines in 1 or 2 json files. Each push pushes 1 new commit to a remote repository on the network. The bare repo size is ~1.4 GB, the size of the checked out files at HEAD is ~50MB, and has roughly 10K files.

Any advice on how to proceed? Currently our workaround plan is to disable all jgit gc (and autogc), and try to use shell git for gc.