Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Running Git Garbage Collection in parallel / locking the repository?


GC was developed in a way that it should support to run a gc while the repo is modified by other threads processes. But I am not sure whether there are certain cases during parallel gc's on the same repo which could lead to problems. To be honest, I would avoid parallel gc's. 

But in theory the problem is not so difficult as it looks in the beginning. The JGit GC operation is doing two things:
1) repack: find all objects referenced (recursively) by all exisiting refs/reflogs/index and try to write them all in a single new packfile. The new packfile gets a name which is derived from the ids of all the objects contained in the pack. If two repacks run in parallel but the second repack was started on a different repositorystate as the first then they will very likely write into two indedpendent packfiles. The previous packfiles can be deleted afterwards
2) prune: find all loose objects which are now unreferenced and which are old enough (default is 2 weeks, see config param gc.pruneExpire)

On the first glimpse these steps could run in parallel, or?




On Mon, May 6, 2013 at 9:12 PM, Alexander Riss <ariss@xxxxxxxxx> wrote:
Hi all,

I was wondering if it is currently supported/handeled to have multiple Git
Garbage Collections triggered from multiple threads on the same repository?

Looking at the GC Command, it did not make the impression that parallel
actions on the repository are allowed while one GC is running? (e.g.
Packed files are requested multiple times - although the list could change

I ran into some repository corruption issues (missing trees/blobs/links)
lately where GC could have been running in parallel - and there also were
frequent pushes to the repository.

Best regards,
 Alexander Riss

jgit-dev mailing list

Back to the top