Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [stellation-res] A database proposal, for fixing 22135.

On Fri, Aug 09, 2002 at 10:44:29AM +0000, Mark C. Chu-Carroll wrote:
> 
> I really hate the idea of using a universal BLOB mechanism
> like you're suggesting, for several reasons.
> 
> (1) A lot of data types are highly structured, and we can capture that
> structure in database tables. It would be a shame to lose that by
> pushing everything into BLOBs, and then reparsing the structured
> data out of the BLOBs. 
> 
> (2) Keeping things in tables means that the whole database is
> readable both by humans, and by other programs. For debugging,
> it's been unbelievably valuable to go into the database and look at
> it. For other stuff, we've got people interested in doing things like
> bug management, change impact prediction, and various kinds of
> change history analysis. Things like that don't work particularly well
> through the repository interface. But if the data is in tables, they
> can go ahead and mine the database for all sorts of interesting data.
> 
> (3) Table storage gives us exploitable structure. This sounds like
> a repeat of point one, but I think it's getting at something different. Even
> when the underlying data doesn't have a strong structure (like, for
> instance, text; text structure is just a list of lines, not a particularly big
> deal structure wise). But by capturing things into database tables, it means
> that we can structure things so that the database query engine itself
> can do things like efficient version retrieval. Instead of trying to optimize
> that ourselves for each new data type in terms of blob operations, we
> can exploit all of the work that goes on in the database community on
> optimization of database operations.
> 
> (4) BLOB operations are often not truly transactional. (For example,
> LOB operations are not fully transactional in DB2.) The importance
> of transactionality cannot be overstated. If the system isn't fully
> transactional, your data is not safe.
> 
> (5) BLOB operations are not very portable. We've already suffered
> greatly using BLOBs for data artifacts. Postgres BLOBs don't implement
> the JDBC blob interface. (They used to; it was dropped in 7.2. Now,
> there's a dreadful "bytea" type that implements JDBC blob operations
> extremely poorly, and true BLOBs can only be handled through a 
> postgres specific LOB API.) Different databases handle BLOBs in
> subtly different ways. Building out entire storage mechanism around
> BLOBs is begging for incredible pain and suffering. 
> 

I think it crucial to keep all project data in a relational database,
and to minimize the use of BLOBs (or CLOBs). The only need for a BLOB is for
a data (unstructured) artifact, and a case can be made than even these artifacts
should be stored in character form, say using a hex encoding, to avoid the problems
transferring data between different database backends, though I am not suggesting we do this
now.

We always want to maximize the amount of information in the database that can be retrieved and
examined using SQL.

dave
-- 
Dave Shields, IBM Research, shields@xxxxxxxxxxxxxx. 


Back to the top