[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [stellation-res] A database proposal, for fixing 22135.
|
On Fri, Aug 09, 2002 at 08:11:12AM +0000, Mark C. Chu-Carroll wrote:
> On Friday 09 August 2002 02:45 am, Florin Iucha wrote:
> > On Sun, Aug 04, 2002 at 08:10:13PM -0400, Mark C. Chu-Carroll wrote:
> > [snip]
> >
> > > OK. Enough background. Here's what I'd like to do.
> > >
> > > I'd like to separate the tags from the lines in the database. And I'd
> > > also like to burst the tags out, so that instead of bunding the
> > > information about what versions a given line is part of into a tag
> > > string, it gets broken out into a bunch of lines in a different table.
> > >
> > > So... The Texts table becomes two tables: TextLines, and TextVersions.
> > > Each row in TextLines is an artifactID, and a line ID, and the line
> > > string.
> > >
> > > TextVersions is a list of (linenumber, LineID, VersionIDs). There is an
> > > entry in TextVersions for a LineID,VersionID if the line is a member
> > > of the version.
> >
> > There is something that bugs me for a couple of days...
> >
> > Stellation is plugin based: each artifact is [supposed to be] manipulated
> > by a plugin that knows how to parse/merge/diff it.
> >
> > If that't the case, why is the database schema assuming that text files
> > can be _meaningfully_ sliced into lines of text?
>
> That's the database schema for *text* artifacts.
Then I think it should be marked as such. It is not easy to see which
tables are for the core and which for the plugins. And it is not
documented either.
> The idea is that as you add artifact types to the system, you provide
> a database scheme for that artifact type. The ArtifactAgent (the
> plugin type that provides a new artifact type implementation)
> has methods to create the database tables needed to support
> the artifact type, to retreive and store artifacts of the type from and
> to the database, etc.
>
> So what we're talking about is the implementation of TextArtifactAgent.
>
> If you didn't want to use table layout that we're discussing, you'd
> implement a new agent that created tables to store things however
> you choose.
>
> For instance, we've been discussing supporting XML schema's
> with another IBM group. They don't want the kind of storage that
> we do for simple text; they want sliced-and-diced storage of meaningful
> chunks of the schema. So they'd use something more like the list of
> LOBs that you suggest.
>
> > Shouldn't the stellation core store the artifact data in whatever format
> > it pleases (simple linked list of blobs containing data/delta or files
> > stored at filesystem lever or ...) and serve it to the appropriate plugin
> > as a big chunk'o'bits?
>
> But then we're dictating to the artifact type implementors that they
> have to use that BLOB implementation, even if there's a reasonable,
> efficient, easy to read and manipulate representation of their type
> in a set of tables. Our approach is to not dictate storage
> at all, except that it needs to be in the database.
Nope. The core will have an interface to the raw data:
appendData(artifact, data) and
getDataSize(artifact)
readData(artifact, offset, size)
or
getChunkIterator(artifact)
readChunk(iterator)
appendChunk(artifact, data)
and all the plugins would use this interface to store their data.
I am not really thrilled about storing the artifact data in the database
either. Having a 10K line create a 10K * versions rows in the Text
table is quite wastefull. Having a 10K lines XML document create a 25K
(parsed, each <tag> takes one row) * versions is even worse.
If you don't decompose XML at the tag/text level when you store it in
the database, you will have to parse it anyway when you want to
diff/merge it.
Why not store the artifact data in the file system and keep a cookie in
the database?
Checkin would mean
- file upload into a staging area - should not fail
- moving files into the permanent structure and create database
records for the artifacts - should not fail
- create the database metadata records - renames, deletions,
additions
- queue the new data aditions for archival/deltification...
If the third step fails, a scrubber process can remove the orphaned
files from the file system and their database identification records.
florin
--
"If it's not broken, let's fix it till it is."
41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4
Attachment:
pgpGmH49vwSyr.pgp
Description: PGP signature