|Re: [jgit-dev] Storage interface.|
On Tue, Oct 12, 2010 at 5:34 AM, Jérôme Blanchard <jayblanc@xxxxxxxxx> wrote: > Is the storage/file package is portable to another type of storage ? Its *almost* portable. > I mean, did storage package ensure a complete abstraction layer or is it > only organisationnal ? The goal is a complete abstraction layer. We have almost done that. > Recent updates make me think it is now possible to develop another storage > system (database, ejb, etc...) but I'd like to be sure before trying to > develop this. Almost true. :-) I do have a closed source code base that allows JGit to sit on top of a database rather than a local filesystem. The abstraction works sufficiently that I can run JGit's daemon and clone a repository that is stored on the database over any of the Git transport protocols (git:// or smart http://). It also works enough that you can do simple operations like log. Its a fully open source JGit, with the closed source base just extending objects like Repository, RefDatabase, ObjectDatabase (no JGit hacks required, I've upstreamed everything required already). I have not yet implemented writing to refs in this implementation. Consequently I can't say for certain that the RefUpdate API is sufficiently abstracted. I know the RefLog API is *NOT* abstracted yet. The reason ref writing isn't done is because I'm just swamped and ran out of time with this project. I suspect you need to duplicate a lot of code with storage.file's RefUpdate implementation, and that we may be able to share more code. The major part that is missing from a complete abstraction is the transport.IndexPack class. This class is crucial for fetching into a repository, or being on the receiving end of a push into a repository. It is completely dependent upon local file IO and is *NOT* abstracted onto an arbitrary ObjectDatabase implementation. That means I can't fetch into my database, nor can I push into it. So the way I get a Git repository into the database is through a hacked up program I wrote that manually injects objects. (Its not pretty. At all.) The closed source implementation is still closed source because it sits on top of a database API that isn't public, and the code is horrid. I would be embarrassed to show it... especially that importer program that injects objects. I do plan to open source this, but only once I had it cleaned up enough that I was willing to put my name on it and call it my work. :-) I had hoped to spend some of my time the past month cleaning up that code and getting it open sourced before the end of this month. But then my son came 5 weeks early and I discovered life had other plans for me right now. So that just didn't happen. I have learned that writing a new storage implementation is a lot of work. You can do something really naive in about a day or two worth of work... its a lot of typing to implement the various classes that JGit requires. But performance will be so bad its unusable on anything beyond a toy repository. Then you need to spend a lot of time implementing the rest of those APIs (like the async reading methods in ObjectReader) in order to work back towards something even half-way acceptable. Replacing the storage layer in JGit isn't like swapping out MySQL for PostgreSQL in a SQL based application. Its more like trying to build a rocket and fly to the moon and back, using some twine and paperclips you found in the office supply cabinet. The fundamental problem is, most of the algorithms in Git assume that object access is performed in very small constant time, and most of them have very little lookahead available to them. This means your implementation's performance is determined entirely by the round-trip time to talk to your storage system. If that storage system isn't local mapped into memory the way storage.file is, its going to be a lot slower. -- Shawn.
Back to the top