Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [stellation-res] A database proposal, for fixing 22135.

On Friday 09 August 2002 03:18 pm, Florin Iucha wrote:
> [Answer to the second half of Marks' reply. I have to think some more
> about the first half.]
>
> On Fri, Aug 09, 2002 at 10:44:29AM +0000, Mark C. Chu-Carroll wrote:
> [snip]
>
> > > If you don't decompose XML at the tag/text level when you store it in
> > > the database, you will have to parse it anyway when you want to
> > > diff/merge it.
> >
> > It's true on a naive level that you're either pushing things into a
> > very verbose tabular structure, or you need to do some parsing. That
> > doesn't mean that there is no happy medium point, where the XML
> > is somehow chunked and interlinked using the database structure.
> >
> > There are a lot of representations of ASTs (and XML is just an
> > AST syntax) that are really clever, and that make diffs and merges
> > extremely efficient. I'd like implementors to have the option of
> > selecting one of those representations and using the database to exploit
> > it.
>
> No matter how you smart you create an AST and how smart the
> serialization mechanism is, you still end up with a bucket of bits and
> you still need to parse (even if you call it "unserialization") that to
> get the AST back in memory.

That's what I meant by a "happy medium" point - where you have
a storage mechanism that maps a significant amount of artifact
structure into the database structure - so your structure is encoded
partially into chunks that need to be parsed, and part of the structure
is explicitly represented by the layout structure in the database.

> > > Why not store the artifact data in the file system and keep a cookie in
> > > the database?
> >
> > Because file systems are non-transactional.
>
> Yet. By year's end, reiserfs will have transactions implemented at the
> file system level.

But we don't want Stellation to only be reliable on ReiserFS. We
want a system that's safe, robust, and reliable on all versions
of Unix, regardless of what filesystem you choose; and on windows,
regardless of what filesystem you use. 

> >                                             Transactionality is one
> > of the most under-appreciated, under-estimated pieces of functionality
> > that you get from using a database. Transactionality is *the* key to
> > keeping your code safe in the repository.
> >
> > Databases give you transactionality for free.
>
> Not exactly "free". It costs more to insert rows in a database than to
> dump data from the network socket into a buffer and mmap that to the
> disk.

I meant free in the sense of  "not requiring additional code or design
work".

> >                                               You *can* implement
> > transactionality in a filesystem. But to do it *right* is very hard. To
> > be *sure* that you've got it completely right - that you haven't missed
> > any end-cases, that there's no scenario where an interrupted operation
> > will corrupt data in your filesystem - that's very, very hard. For every
> > hundred people who've tried to implement transactionality, more than
> > 99 of them get it wrong. And you don't know that you got it wrong until
> > some data gets lost.
> >
> > I don't trust anyone involved in this project to really implement
> > transactionality *right*. That's not a dig at anyone involved in
> > the project. It's a reflection of how hard the problem is - and none
> > of us are experts on the subject.
>
> Except that you don't need to solve the transactionality problem in the
> general case, but in a special case:
>
>
>    for a SCM, no information is ever _changed_ - information is only
> _added_ to a repository
>
> Adding stuff transactionally is much easier than updating/deleting it.
>
> Note, I am talking here about the artifact data, not metadata.

Easi*er* can be very different from easy.  Even when you're
assuming strict additivity - to  make absolutely sure that things are safe
at all times is not a simple task. 

But even if we can implement additive transactionality - is that
really good enough? I don't really want to commit everyone
who ever extends Stellation to strict additivity. Additivity is fine
for code. But it's not fine for metadata. Metadata is very
important for the long-term goals of Stellation, and the line
between primary data and metadata can become extremely fuzzy.

I strongly believe that the advantages of the relational database
are important enough that we should be using it. If a storage
format in the database turns out to be overly wasteful of space,
I'd rather spend the time to change the schema so that it isn't
so bad, than to give up on the database for storage.

	-Mark


-- 
Mark Craig Chu-Carroll,  IBM T.J. Watson Research Center  
*** The Stellation project: Advanced SCM for Collaboration
***		http://www.eclipse.org/stellation
*** Work Email: mcc@xxxxxxxxxxxxxx  ------- Personal Email: markcc@xxxxxxxxxxx




Back to the top