Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] JGit backup & synchronization

On Thu, May 3, 2012 at 7:30 AM,  <epeters1111@xxxxxxxxx> wrote:
> (1) What's the recommended approach to backing up a repository that's being used by JGit?

Same as backing up any other Git repository.

>  Based on some threads from Stack Overflow [1], it seems like the "right" way is to use "bundle" to create snapshots,
> [1] http://stackoverflow.com/questions/2129214/backup-a-local-git-repository

This is one valid way to do it. Another way is to use git clone, which
that thread discourages. git clone is a fine backup method, the issues
on that thread are about how to you then backup that directory of
files? If your backup system has trouble with a directory, then you
might need to e.g. tar the clone first. At which point bundle might
just be a good approach.

Another way is to store the repository on a filesystem that supports
snapshots. If you snapshot the POSIX filesystem, the repository will
be consistent as of that snapshot, and then you can backup the
snapshot. E.g. btfs on Linux or ZFS on Solaris/FreeBSD. JGit (and
normal Git) always perform updates in a safe ordered way to support
this sort of approach.

It depends on how much data you are talking about. bundle/clone will
produce a complete copy of the repository. For one repository done
nightly, this isn't really a problem. For 1600 repositories that weigh
in over 200G total, it is. Its that latter case where snapshotting
filesystems can be useful with Git.

> which seems to be supported in JGit through org.eclipse.jgit.transport.BundleWriter.  Does this sound about right?

Yes.

> (2) The JGit Repository is thread-safe, so it seems like our app could support multiple threads interacting with the same repository.

This is correct. Gerrit Code Review runs the JGit Repository as a
singleton across multiple concurrent threads, under very high user
loads, and has been doing that for ~3 years in production at a lot of
companies. Its thread-safe. :-)

>  But it seems like there's still just one working directory

Its thread safe... until you touch the working directory.

A bare repository is thread-safe.

I don't think we provide any assurances the working directory is
thread-safe. Some internal structures might be thread-safe enough that
you can perform different working directory operations sequentially on
different threads. But the working directory is not strictly thread
safe on its own. Its too hard to provide the right consistency
guarantees across all of the files in the working tree.

You can run multiple working directories, but you would need to do
things a bit more yourself, by tracking your own DirCache objects and
their locations on disk, and tracking your own directory root for each
user, and this may mean you can't use all of the JGit API classes
because they assume the single working directory relationship with a
Repository.

> (3) Is it ever safe to have multiple *processes* using the same repository?

Yes, this is explicitly supported to permit end-users to use JGit from
within an embedded application (e.g. Eclipse IDE) and still use
git-core on the command line.

>  I'm thinking of two scenarios: (a) a user using command-line git at the same time our app is using JGit, and/or

Yes, like I just said this was an explicit design goal with JGit.

> (b) two app processes with a shared disk.

Also works.


Back to the top