|[egit-dev] Re: [JGit-io-RFC-PATCH v2 2/4] Add JGit IO SPI and default implementation|
imyousuf@xxxxxxxxx wrote: > The SPI mainly focus's in providing an API to JGit to be able to perform > similar operations to that of java.io.File. All direct I/O is based on the > java.io.Input/OutputStream classes. > > Different JGit IO SPI provider is designed to be URI scheme based and thus > the default implementation is that of "file" scheme. SPI provider will be > integrated by their respective users in a manner similar to that of JDBC > driver registration. There is a SystemStorageManager that has similar > registration capabilities and the system storage providers should be > registered with the manager in one of the provided ways. I think this may be a bit in the wrong direction for what we are trying to accomplish. A number of people really want to map Git onto what is essentially Google's BigTable schema. Aside from Google's own BigTable product (which I want to use Git on at work, because it would vastly simplfiy my system administration duties at $DAYJOB) there is Cassandra and Hadoop HBase which implement the same schema semantics. None of those systems implement file streams, they implement cell storage in a non-transactional system with a semi-dynamic schema. Some people have built transactional semantics on top of these storage layers, e.g. Google AppEngine provides multiple row transactions through some magic sauce layered on top of BigTable. I'm sure people will build similar tools on top of Cassandra and HBase. Where I'm trying to go with this is that things that are stored in files on the filesystem in traditional Git wouldn't normally be mapped into "byte streams" in a BigTable-ish system, or even the JDBC-ish system you were describing. For .git/config we might want to map config variable names into keys in the table, with values stored in cells. This makes it easier to query or edit the data. Fortunately, "Config" is abstract enough that we could subclass it with a CassandraConfig and simply use that instance when on a based Cassandra storage system. No file streams required. Ditto for a JdbcConfig. For RefDatabase, we'd want to do the same and avoid the concept of packed-refs altogether. Each Ref should go into its own row in a Cassandra storage system, and essentially act as a loose object. Ditto with JDBC. We'd probably never need to read-or-write the info/refs or objects/info/packs listings. And I think that's everything that a bare repository needs, aside from ObjectDatabase, which is already mostly abstract anyway. -- Shawn.
Back to the top