[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [stellation-res] A database proposal, for fixing 22135.
|
On Saturday 10 August 2002 01:33 am, Florin Iucha wrote:
> On Fri, Aug 09, 2002 at 09:20:56PM +0000, Mark C. Chu-Carroll wrote:
> > > > Transactionality is one
> > > > of the most under-appreciated, under-estimated pieces of
> > > > functionality that you get from using a database. Transactionality is
> > > > *the* key to keeping your code safe in the repository.
> > > >
> > > > Databases give you transactionality for free.
> > >
> > > Not exactly "free". It costs more to insert rows in a database than to
> > > dump data from the network socket into a buffer and mmap that to the
> > > disk.
> >
> > I meant free in the sense of "not requiring additional code or design
> > work".
>
> Hrm... and stuffing all data in a XML document using DOM and squeezing
> that through a socked is free in the same sense. But BAD.
I disagree with the badness part here...
I like our messaging framework rather a lot. It's extremely flexible, and
respectably fast. It's very well suited to our long-term goals: ease
of extension is the single most important design goal of the system.
The message bus architecture that we use meets that goal
extremely well. And compared, in terms of efficiency, to many
of the alternatives (WebDAV, CVS wire protocol) it's practically
a speed-demon.
My only complaint against it is that we shouldn't always be
using DOM builders for assembling the message documents. For
many of the relatively short but structurally complex messages that we
use, DOM is a good tool. For sending zipped branch images over the wire,
it's very wasteful of memory to be bundling up base64'd
zip images in DOM objects. That's something that we do should
change. Unfortunately, the list of things that we should change is
rather long.
> > > > You *can* implement
> > > > transactionality in a filesystem. But to do it *right* is very hard.
> > > > To be *sure* that you've got it completely right - that you haven't
> > > > missed any end-cases, that there's no scenario where an interrupted
> > > > operation will corrupt data in your filesystem - that's very, very
> > > > hard. For every hundred people who've tried to implement
> > > > transactionality, more than 99 of them get it wrong. And you don't
> > > > know that you got it wrong until some data gets lost.
> > > >
> > > > I don't trust anyone involved in this project to really implement
> > > > transactionality *right*. That's not a dig at anyone involved in
> > > > the project. It's a reflection of how hard the problem is - and none
> > > > of us are experts on the subject.
> > >
> > > Except that you don't need to solve the transactionality problem in the
> > > general case, but in a special case:
> > >
> > >
> > > for a SCM, no information is ever _changed_ - information is only
> > > _added_ to a repository
> > >
> > > Adding stuff transactionally is much easier than updating/deleting it.
> > >
> > > Note, I am talking here about the artifact data, not metadata.
> >
> > Easi*er* can be very different from easy. Even when you're
> > assuming strict additivity - to make absolutely sure that things are
> > safe at all times is not a simple task.
> >
> > But even if we can implement additive transactionality - is that
> > really good enough? I don't really want to commit everyone
> > who ever extends Stellation to strict additivity. Additivity is fine
> > for code. But it's not fine for metadata. Metadata is very
> > important for the long-term goals of Stellation, and the line
> > between primary data and metadata can become extremely fuzzy.
>
> Please read my previous sentence again.
Read my last sentence again :-). The distinction between primary
data and metadata is going to be getting very fuzzy.
> > I strongly believe that the advantages of the relational database
> > are important enough that we should be using it. If a storage
> > format in the database turns out to be overly wasteful of space,
>
> ... and time...
Time is definitely a concern. But I think that performance can be
respectable using RDB storage. ClearCase uses an RDB for
the backend storage, and uses a storage format for text that's
very similar to what we're looking it, and it's no slouch on speed. Perforce
uses an RDB for storage, and has a reputation as being
lightning fast.
> > I'd rather spend the time to change the schema so that it isn't
> > so bad, than to give up on the database for storage.
>
> OK. Let's move on.
If you're convinced, sure. If not - I'm willing to continue discussing it,
and I'm willing to be convinced that I'm on the wrong path. I'm also
willing to respect the wishes of the rest of the team if everyone who
cares disagrees with me.
By the way, has your Eclipse commit password come through yet?
I haven't heard from them since I sent the vote results off to the
admins.
- Mark
--
Mark Craig Chu-Carroll, IBM T.J. Watson Research Center
*** The Stellation project: Advanced SCM for Collaboration
*** http://www.eclipse.org/stellation
*** Work Email: mcc@xxxxxxxxxxxxxx ------- Personal Email: markcc@xxxxxxxxxxx