Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit

From: Adrian <adrian.wilkins@xxxxxxxxx>
Date: Fri, 28 Jan 2011 21:34:03 +0000
Delivered-to: jgit-dev@xxxxxxxxxxx
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=WkmSkWKIIukaNkg/Nrd3xhwVZetKnD4tQP7Ncz0a34cm7yIJ14gq2ySFuDsPWRteC/ 4f8qJ43r0QKFc95/ECNN+FQ0qaWRp3/1nTeObrWc4wuNrqz3R+eRrdlz7C78VRhs9t1B vnTntoBhjElAyY1xypA41egzpXB7qmWHQcYkI=
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Fri, Jan 28, 2011 at 6:18 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>
>> Specifically I'm exploring the idea of managing objects that are
>> values in a K/V store, rather than objects that are files, by breaking
>> the keyspace into trees.
>
> I'm not sure this is going to be very helpful.
>
> Last October-ish I tried a different NoSQL based storage
> implementation that stored each object into its own row.  For
> linux-2.6 that meant 1.8 million rows, rather than ~407 as I described
> above.  It also required range scan support from the NoSQL server,
> which reduced the number of systems that could be supported, it no
> loner was just a K/V store, it had to use a binary tree or sorted file
> as its underlying storage system.
>
> It was _way_ slower than what I'm doing now, and it took a lot more
> coding to get a lot less functionality.

I'm not sure I'm putting myself across correctly - my goal is to
version control
a set of objects which are themselves persisted as entries in a K/V store,
rather than a bunch of source code files. So one K/V store is my working tree.
The second is for loose objects.

I think a K/V backend is important for
the loose objects because of the limitations inherent in trying to
store several
million objects as individual files in an ObjectDirectory implementation. I
completely agree that packed objects are much more efficient and have lower
overhead ; the overhead of storing packs in a K/V store for a local-disk-only
implementation is probably not worth it. So what I'm driving at is
that for my purposes
I'd like to try to get to a jgit stack where

* The working "tree" is a K/V store
** With some extra objects for trees
** Requiring some persistence layer support
** For which I'd expect to have to write an alternate tree implementation

* The loose objects are in a K/V store
** Because millions of teensy files will stress most file systems adversely

* Packed objects are in standard pack files
** Why bother with the overhead of storing them in another container
** Not doing this for the enterprise scaling or redundancy

Follow-Ups:
- Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
  - From: Shawn Pearce

References:
- Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
  - From: Adrian
- Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
  - From: Shawn Pearce

Prev by Date: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Next by Date: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Previous by thread: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Next by thread: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Index(es):
- Date
- Thread

Breadcrumbs