[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [stellation-res] A database proposal, for fixing 22135.
|
[Answer to the second half of Marks' reply. I have to think some more
about the first half.]
On Fri, Aug 09, 2002 at 10:44:29AM +0000, Mark C. Chu-Carroll wrote:
[snip]
> > If you don't decompose XML at the tag/text level when you store it in
> > the database, you will have to parse it anyway when you want to
> > diff/merge it.
>
> It's true on a naive level that you're either pushing things into a
> very verbose tabular structure, or you need to do some parsing. That
> doesn't mean that there is no happy medium point, where the XML
> is somehow chunked and interlinked using the database structure.
>
> There are a lot of representations of ASTs (and XML is just an
> AST syntax) that are really clever, and that make diffs and merges
> extremely efficient. I'd like implementors to have the option of selecting
> one of those representations and using the database to exploit it.
No matter how you smart you create an AST and how smart the
serialization mechanism is, you still end up with a bucket of bits and
you still need to parse (even if you call it "unserialization") that to
get the AST back in memory.
>
> > Why not store the artifact data in the file system and keep a cookie in
> > the database?
>
> Because file systems are non-transactional.
Yet. By year's end, reiserfs will have transactions implemented at the
file system level.
> Transactionality is one
> of the most under-appreciated, under-estimated pieces of functionality
> that you get from using a database. Transactionality is *the* key to keeping
> your code safe in the repository.
>
> Databases give you transactionality for free.
Not exactly "free". It costs more to insert rows in a database than to
dump data from the network socket into a buffer and mmap that to the
disk.
> You *can* implement
> transactionality in a filesystem. But to do it *right* is very hard. To
> be *sure* that you've got it completely right - that you haven't missed
> any end-cases, that there's no scenario where an interrupted operation
> will corrupt data in your filesystem - that's very, very hard. For every
> hundred people who've tried to implement transactionality, more than
> 99 of them get it wrong. And you don't know that you got it wrong until
> some data gets lost.
>
> I don't trust anyone involved in this project to really implement
> transactionality *right*. That's not a dig at anyone involved in
> the project. It's a reflection of how hard the problem is - and none
> of us are experts on the subject.
Except that you don't need to solve the transactionality problem in the
general case, but in a special case:
for a SCM, no information is ever _changed_ - information is only _added_
to a repository
Adding stuff transactionally is much easier than updating/deleting it.
Note, I am talking here about the artifact data, not metadata.
>
> > Checkin would mean
> > - file upload into a staging area - should not fail
> > - moving files into the permanent structure and create database
> > records for the artifacts - should not fail
>
> "mv" is not guaranteed to be an atomic operation in Unix. How do you
> make sure that when you move it into the permanent structure that you didn't
> mess up?
"mv" within a filesystem (not across mountpoints) is/should be atomic.
This is how countless MDA deliver e-mail daily onto maildir/mh.
florin
--
"If it's not broken, let's fix it till it is."
41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4
Attachment:
pgptpCVlq9hjW.pgp
Description: PGP signature