Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [stellation-res] Databases, artifacts and all the rest...

On Friday 16 August 2002 10:34 am, Ringo De Smet wrote:
> Hello,
>
> It took me some time to catch up on all the mails. Although I'm not
> very experienced with database back-ends, I do have some ideas on the
> cooperation between the artifact agents and the database access layer.
>
> Before I go into the artifact area, I would like to go back to the
> requirements of the database layer: to abstract away the differences
> between the RDBM systems that can be used together with Stellation. As
> a result, it's wrong to go into the direction of a one-size-fits-all
> solution. In this context, the template mechanism is a good example of
> doing something the *wrong* way. On the other hand, using an XML
> description of the database structure is not flexible. First reason to
> avoid XML: the XML description should be kept in sync with the code:
> error prone! Second reason: we will end up with an XML for the core
> Stellation and an XML document for each of the artifact types. One
> question I ask myself is if the XML descriptor can be written database
> independent.
> I think that we should be able to derive our database layout from our
> code model. It's already hard enough to keep one model clean... :)

I'm definitely opposed to the XML idea, but I'm increasingly convinced that 
a well-designed abstraction is the right way to go.

> About artifacts:
> <statement>
> We don't go far enough in using artifact types!
> </statement>
> I wondered why things like versions, projects, branches are not
> modelled as artifact types. There are tables for it, why not model it
> as artifacts then? I could start modelling artifacts like a solution
> map (is it clear that I'm a VisualAge for Java user? :), bug reports
> (why split bug tracking from development?), ...
> Having nothing but artifacts would remove the need for
> DBAccessPoint.createRepository. At this point in time, we already have
> the SQL in one place: the ArtifactAgents and no longer in
> DBAccessPoint.

I think this is a bit misguided, and would ultimately result in a confusing
non-change.

There's a certain amount of intrinsic data that defines the fundamental 
behavior of a repository. In Stellation, that's projects, branches, ant 
artifacts. The definition of an Artifact, to Stellation, is a versioned 
entity that can be stored in a project. A branch is a thread of history
in a project that maps artifacts onto versions. And an agent is a piece of
extension code that knows how to manipulate a particular type of artifact. 

As such,  the Project and Branch objects are inevitably special to the
system: you can't build a meaningful Stellation repository without them,
because they define the fundamental semantics of what a Stellation
repository *is*.

We *could* move the branches &c into Agents. But the end result would be
taking a bunch of code that's currently in the DB AccessPoint and LocalHandle 
classes, moving them out into (a) separate agent class(es), and then putting
hard-coded references to those classes into DBAccessPoint and LocalHandle. 
I don't see what advantage that would have, except for adding another layer
of scatter to the code.  

(For definition's sake, I'm using scatter in a sort of technical way.  We work
with the aspect oriented programming (AOP) folks here at IBM, and it's 
terminology that comes up in that community. What I mean is: when you write 
object-oriented code, there are a bunch of good properties: you get better
encapsulation, safety, behavioral integration, and so on. But in exchange,
you get scatter: Things that are strongly related in data and control flow
get separated. Scatter isn't always bad, but the more scatter you have, the
more confusing it is for a new developer to learn to understand your code. My
biggest critique against AOP is that it violates encapsulation/hiding, thus
reducing one of the most important benefits of OO, while at the same time
increasing scatter.)

Anyway... I think that moving code from the DBAP and LocalHandle into special
case agents is an example of undesirable scatter: it moves things away from 
the single unique point where they're used, and wraps them into a general
purpose abstraction whose semantics don't really match them.

> About database abstractions:
> <statement>
> The database abstraction layer should be the only place that constructs
> SQL statements.
> </statement>
> Having nothing but artifact types, I came up with having the SQL in the
> ArtifactAgents, so away from the DBAccessPoint. It's true that it seems
> to conflict the statement I make in this section, and true you are! I
> haven't had time enough to come up with a clear answer or direction,
> but the only thing I can say is that artifact agents should be able to
> speak to the database access layer in a standardised API on how to
> store and retrieve the information belonging to the artifact type. The
> specific implementation of the database layer can then generate SQL
> statements optimized for the RDBMS in use. Maybe OJB(1) can provide us
> with a lot of support in this area. On the other hand, my view on this
> matter could be oversimplified, but then I would like to hear from you
> all.

My intuition, based on all of the database books and manuals that I've been
reading is that we don't want to go too far with this.

SQL DDL varies enormously between different databases. For the most part,
the variations are shallow, but pervasive. This is why we really need a
abstraction layer here. There just doesn't seem to be any good way to
make DDL work accross the full spectrum of DBs that we'd like to support.

SQL DML varies *much* less. In fact, SQL DML seems to be almost entirely 
portable between databases except for LOBs. And JDBC abstracts the LOBs into
a portable form. So, it looks to me like if you carefully write SQL DML in the 
standard dialect, limiting yourself to standard functions and avoiding DB 
specific extensions or stored procedures, you'll be portable accross the 
entire spectrum of databases that have been proposed for Stellation. (From
what I can tell, I think that the DML that we've written for Stellation can 
easily be made portable between Postgres, HSQL, Firebird, DB2, Oracle, MySQL,
and SAPDB.) On the other hand, the moment stored procedures enter the picture,
all hope of portability goes right out the window. So any abstraction layer
we were to write here would, I think, not capture anything particularly
interesting. The impact of obscuring the particular mechanics of the query 
would outweigh the benefits of the abstraction.

I'm going to read over the OJB stuff that you suggested before I go any 
further with the code I'm writing; I'll let you know what I think.

	-Mark
-- 
Mark Craig Chu-Carroll,  IBM T.J. Watson Research Center  
*** The Stellation project: Advanced SCM for Collaboration
***		http://www.eclipse.org/stellation
*** Work Email: mcc@xxxxxxxxxxxxxx  ------- Personal Email: markcc@xxxxxxxxxxx




Back to the top