Re: [jgit-dev] Storage interface.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] Storage interface.

From: Shawn Pearce <spearce@xxxxxxxxxxx>
Date: Tue, 12 Oct 2010 11:36:10 -0700
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Tue, Oct 12, 2010 at 5:34 AM, Jérôme Blanchard <jayblanc@xxxxxxxxx> wrote:
> Is the storage/file package is portable to another type of storage ?

Its *almost* portable.

> I mean, did storage package ensure a complete abstraction layer or is it
> only organisationnal ?

The goal is a complete abstraction layer.  We have almost done that.

> Recent updates make me think it is now possible to develop another storage
> system (database, ejb, etc...) but I'd like to be sure before trying to
> develop this.

Almost true.  :-)

I do have a closed source code base that allows JGit to sit on top of
a database rather than a local filesystem.  The abstraction works
sufficiently that I can run JGit's daemon and clone a repository that
is stored on the database over any of the Git transport protocols
(git:// or smart http://).  It also works enough that you can do
simple operations like log.  Its a fully open source JGit, with the
closed source base just extending objects like Repository,
RefDatabase, ObjectDatabase (no JGit hacks required, I've upstreamed
everything required already).

I have not yet implemented writing to refs in this implementation.
Consequently I can't say for certain that the RefUpdate API is
sufficiently abstracted.  I know the RefLog API is *NOT* abstracted
yet.  The reason ref writing isn't done is because I'm just swamped
and ran out of time with this project.  I suspect you need to
duplicate a lot of code with storage.file's RefUpdate implementation,
and that we may be able to share more code.

The major part that is missing from a complete abstraction is the
transport.IndexPack class.  This class is crucial for fetching into a
repository, or being on the receiving end of a push into a repository.
 It is completely dependent upon local file IO and is *NOT* abstracted
onto an arbitrary ObjectDatabase implementation.  That means I can't
fetch into my database, nor can I push into it.  So the way I get a
Git repository into the database is through a hacked up program I
wrote that manually injects objects.  (Its not pretty.  At all.)

The closed source implementation is still closed source because it
sits on top of a database API that isn't public, and the code is
horrid.  I would be embarrassed to show it... especially that importer
program that injects objects.  I do plan to open source this, but only
once I had it cleaned up enough that I was willing to put my name on
it and call it my work.  :-)

I had hoped to spend some of my time the past month cleaning up that
code and getting it open sourced before the end of this month.  But
then my son came 5 weeks early and I discovered life had other plans
for me right now.  So that just didn't happen.

I have learned that writing a new storage implementation is a lot of
work.  You can do something really naive in about a day or two worth
of work... its a lot of typing to implement the various classes that
JGit requires.  But performance will be so bad its unusable on
anything beyond a toy repository.  Then you need to spend a lot of
time implementing the rest of those APIs (like the async reading
methods in ObjectReader) in order to work back towards something even
half-way acceptable.

Replacing the storage layer in JGit isn't like swapping out MySQL for
PostgreSQL in a SQL based application.  Its more like trying to build
a rocket and fly to the moon and back, using some twine and paperclips
you found in the office supply cabinet.  The fundamental problem is,
most of the algorithms in Git assume that object access is performed
in very small constant time, and most of them have very little
lookahead available to them.  This means your implementation's
performance is determined entirely by the round-trip time to talk to
your storage system.  If that storage system isn't local mapped into
memory the way storage.file is, its going to be a lot slower.

-- 
Shawn.

Follow-Ups:
- Re: [jgit-dev] Storage interface.
  - From: Jérôme Blanchard

References:
- [jgit-dev] Storage interface.
  - From: Jérôme Blanchard

Prev by Date: Re: [jgit-dev] Storage interface.
Next by Date: Re: [jgit-dev] Storage interface.
Previous by thread: Re: [jgit-dev] Storage interface.
Next by thread: Re: [jgit-dev] Storage interface.
Index(es):
- Date
- Thread

Breadcrumbs