Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
On Fri, Jan 28, 2011 at 6:18 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>> Specifically I'm exploring the idea of managing objects that are
>> values in a K/V store, rather than objects that are files, by breaking
>> the keyspace into trees.
> I'm not sure this is going to be very helpful.
> Last October-ish I tried a different NoSQL based storage
> implementation that stored each object into its own row. For
> linux-2.6 that meant 1.8 million rows, rather than ~407 as I described
> above. It also required range scan support from the NoSQL server,
> which reduced the number of systems that could be supported, it no
> loner was just a K/V store, it had to use a binary tree or sorted file
> as its underlying storage system.
> It was _way_ slower than what I'm doing now, and it took a lot more
> coding to get a lot less functionality.
I'm not sure I'm putting myself across correctly - my goal is to
a set of objects which are themselves persisted as entries in a K/V store,
rather than a bunch of source code files. So one K/V store is my working tree.
The second is for loose objects.
I think a K/V backend is important for
the loose objects because of the limitations inherent in trying to
million objects as individual files in an ObjectDirectory implementation. I
completely agree that packed objects are much more efficient and have lower
overhead ; the overhead of storing packs in a K/V store for a local-disk-only
implementation is probably not worth it. So what I'm driving at is
that for my purposes
I'd like to try to get to a jgit stack where
* The working "tree" is a K/V store
** With some extra objects for trees
** Requiring some persistence layer support
** For which I'd expect to have to write an alternate tree implementation
* The loose objects are in a K/V store
** Because millions of teensy files will stress most file systems adversely
* Packed objects are in standard pack files
** Why bother with the overhead of storing them in another container
** Not doing this for the enterprise scaling or redundancy