Re: [stellation-res] A database proposal, for fixing 22135.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [stellation-res] A database proposal, for fixing 22135.

From: florin@xxxxxxxxx (Florin Iucha)
Date: Fri, 9 Aug 2002 08:35:36 -0500
Delivered-to: stellation-res@xxxxxxxxxxxxxxx
List-archive: <http://dev.eclipse.org/pipermail/stellation-res/>
List-help: <mailto:stellation-res-request@dev.eclipse.org?subject=help>
List-subscribe: <http://dev.eclipse.org/mailman/listinfo/stellation-res>, <mailto:stellation-res-request@dev.eclipse.org?subject=subscribe>
List-unsubscribe: <http://dev.eclipse.org/mailman/listinfo/stellation-res>, <mailto:stellation-res-request@dev.eclipse.org?subject=unsubscribe>
User-agent: Mutt/1.4i

On Fri, Aug 09, 2002 at 08:11:12AM +0000, Mark C. Chu-Carroll wrote:
> On Friday 09 August 2002 02:45 am, Florin Iucha wrote:
> > On Sun, Aug 04, 2002 at 08:10:13PM -0400, Mark C. Chu-Carroll wrote:
> > [snip]
> >
> > > OK. Enough background. Here's what I'd like to do.
> > >
> > > I'd like to separate the tags from the lines in the database. And I'd
> > > also like to burst the tags out, so that instead of bunding the
> > > information about what versions a given line is part of into a tag
> > > string, it gets broken out into a bunch of lines in a different table.
> > >
> > > So... The Texts table becomes two tables: TextLines, and TextVersions.
> > > Each row in TextLines is an artifactID, and a line ID, and the line
> > > string.
> > >
> > > TextVersions is a list of (linenumber, LineID, VersionIDs). There is an
> > > entry in TextVersions for a LineID,VersionID if the line is a member
> > > of the version.
> >
> > There is something that bugs me for a couple of days...
> >
> > Stellation is plugin based: each artifact is [supposed to be] manipulated
> > by a plugin that knows how to parse/merge/diff it.
> >
> > If that't the case, why is the database schema assuming that text files
> > can be _meaningfully_ sliced into lines of text?
> 
> That's the database schema for *text* artifacts. 

Then I think it should be marked as such. It is not easy to see which
tables are for the core and which for the plugins. And it is not
documented either.

> The idea is that as you add artifact types to the system, you provide
> a database scheme for that artifact type. The ArtifactAgent (the
> plugin type that provides a new artifact type implementation)
> has methods to create the database tables needed to support
> the artifact type, to retreive and store artifacts of the type from and
> to the database, etc.
> 
> So what we're talking about is the implementation of TextArtifactAgent. 
> 
> If you didn't want to use table layout that we're discussing, you'd
> implement a new agent that created tables to store things however
> you choose.
> 
> For instance, we've been discussing supporting XML schema's
> with another IBM group. They don't want the kind of storage that
> we do for simple text; they want sliced-and-diced storage of meaningful
> chunks of the schema. So they'd use something more like the list of
> LOBs that you suggest. 
> 
> > Shouldn't the stellation core store the artifact data in whatever format
> > it pleases (simple linked list of blobs containing data/delta or files
> > stored at filesystem lever or ...) and serve it to the appropriate plugin
> > as a big chunk'o'bits?
> 
> But then we're dictating to the artifact type implementors that they
> have to use that BLOB implementation, even if there's a reasonable,
> efficient, easy to read and manipulate representation of their type
> in a set of tables. Our approach is to not dictate storage 
> at all, except that it needs to be in the database.

Nope. The core will have an interface to the raw data:
   appendData(artifact, data) and
   getDataSize(artifact)
   readData(artifact, offset, size)
or
   getChunkIterator(artifact)
   readChunk(iterator)
   appendChunk(artifact, data)
and all the plugins would use this interface to store their data.

I am not really thrilled about storing the artifact data in the database
either. Having a 10K line create a 10K * versions rows in the Text
table is quite wastefull. Having a 10K lines XML document create a 25K
(parsed, each <tag> takes one row) * versions is even worse.

If you don't decompose XML at the tag/text level when you store it in
the database, you will have to parse it anyway when you want to
diff/merge it.

Why not store the artifact data in the file system and keep a cookie in
the database?

Checkin would mean
   - file upload into a staging area - should not fail
   - moving files into the permanent structure and create database
     records for the artifacts - should not fail
   - create the database metadata records - renames, deletions,
     additions
   - queue the new data aditions for archival/deltification...

If the third step fails, a scrubber process can remove the orphaned
files from the file system and their database identification records.

florin

-- 

"If it's not broken, let's fix it till it is."

41A9 2BDE 8E11 F1C5 87A6  03EE 34B3 E075 3B90 DFE4

Attachment: pgpGmH49vwSyr.pgp
Description: PGP signature

Follow-Ups:
- Re: [stellation-res] A database proposal, for fixing 22135.
  - From: Mark C. Chu-Carroll

References:
- [stellation-res] A database proposal, for fixing 22135.
  - From: Mark C. Chu-Carroll
- Re: [stellation-res] A database proposal, for fixing 22135.
  - From: Florin Iucha
- Re: [stellation-res] A database proposal, for fixing 22135.
  - From: Mark C. Chu-Carroll

Prev by Date: Re: [stellation-res] A database proposal, for fixing 22135.
Next by Date: Re: [stellation-res] A database proposal, for fixing 22135.
Previous by thread: Re: [stellation-res] A database proposal, for fixing 22135.
Next by thread: Re: [stellation-res] A database proposal, for fixing 22135.
Index(es):
- Date
- Thread

Breadcrumbs