Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit

From: Shawn Pearce <spearce@xxxxxxxxxxx>
Date: Fri, 28 Jan 2011 14:56:42 -0800
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Fri, Jan 28, 2011 at 13:34, Adrian <adrian.wilkins@xxxxxxxxx> wrote:
>
> I'm not sure I'm putting myself across correctly - my goal is to
> version control
> a set of objects which are themselves persisted as entries in a K/V store,
> rather than a bunch of source code files. So one K/V store is my working tree.
> The second is for loose objects.

OK, now I'm following you.

> I think a K/V backend is important for
> the loose objects because of the limitations inherent in trying to
> store several
> million objects as individual files in an ObjectDirectory implementation.

Yes, but you would never store millions of loose objects, you would
switch to pack files long before you got that many.

> I
> completely agree that packed objects are much more efficient and have lower
> overhead ; the overhead of storing packs in a K/V store for a local-disk-only
> implementation is probably not worth it. So what I'm driving at is
> that for my purposes
> I'd like to try to get to a jgit stack where
>
> * The working "tree" is a K/V store

This would be nice for "client in a cloud" model, like what project
Orion is trying to do at Eclipse.  We already want to abstract the
working "tree" APIs so that JGit can more directly use the EGit
IResource APIs when making changes to the workspace... but we're not
there yet, and I don't think our plans would handle millions of
working "tree" items that are treated like a normal source code
checkout.  So you may be a bit more off in your own direction here.

> * The loose objects are in a K/V store
> ** Because millions of teensy files will stress most file systems adversely

Again, why not repack these?  Pack files are a K/V store, just more
limited.  They can only be updated by completely rewriting them, and
the keys must be SHA-1s.  But both Cassandra and Hadoop HBase
implement their backends by doing complete rewrites of segments of
their K/V store when there are sufficient changes to make a compaction
worthwhile.  Likewise... Git pack files.

> * Packed objects are in standard pack files
> ** Why bother with the overhead of storing them in another container
> ** Not doing this for the enterprise scaling or redundancy

Why use a K/V store if you want standard pack files in your local filesystem?

-- 
Shawn.

Follow-Ups:
- Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
  - From: Adrian

References:
- Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
  - From: Adrian
- Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
  - From: Shawn Pearce
- Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
  - From: Adrian

Prev by Date: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Next by Date: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Previous by thread: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Next by thread: Re: [jgit-dev] [RFC] Cassandra based storage layer for JGit
Index(es):
- Date
- Thread

Breadcrumbs