Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit

On Fri, Jan 28, 2011 at 6:18 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>
>> Specifically I'm exploring the idea of managing objects that are
>> values in a K/V store, rather than objects that are files, by breaking
>> the keyspace into trees.
>
> I'm not sure this is going to be very helpful.
>
> Last October-ish I tried a different NoSQL based storage
> implementation that stored each object into its own row.  For
> linux-2.6 that meant 1.8 million rows, rather than ~407 as I described
> above.  It also required range scan support from the NoSQL server,
> which reduced the number of systems that could be supported, it no
> loner was just a K/V store, it had to use a binary tree or sorted file
> as its underlying storage system.
>
> It was _way_ slower than what I'm doing now, and it took a lot more
> coding to get a lot less functionality.

I'm not sure I'm putting myself across correctly - my goal is to
version control
a set of objects which are themselves persisted as entries in a K/V store,
rather than a bunch of source code files. So one K/V store is my working tree.
The second is for loose objects.

I think a K/V backend is important for
the loose objects because of the limitations inherent in trying to
store several
million objects as individual files in an ObjectDirectory implementation. I
completely agree that packed objects are much more efficient and have lower
overhead ; the overhead of storing packs in a K/V store for a local-disk-only
implementation is probably not worth it. So what I'm driving at is
that for my purposes
I'd like to try to get to a jgit stack where

* The working "tree" is a K/V store
** With some extra objects for trees
** Requiring some persistence layer support
** For which I'd expect to have to write an alternate tree implementation

* The loose objects are in a K/V store
** Because millions of teensy files will stress most file systems adversely

* Packed objects are in standard pack files
** Why bother with the overhead of storing them in another container
** Not doing this for the enterprise scaling or redundancy


Back to the top