[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [stellation-res] Oracle backend!
|
On Wednesday 24 July 2002 05:48 am, Ringo De Smet wrote:
> Hello,
>
> Based on Mark's explanation that is also copied to the Eclipse Wiki, I
> started writing an Oracle backend. I'm now at the point where I have
> been able to run the command
>
> svc --location=oracle:svc configure database
>
> successfully. There are some remarks however:
>
> 1) For the long string, I have used Oracle's VARCHAR(2000). If my
> colleague has informed me well, this seems to be the largest size
> possible. In the DB2 version, it was put on 16000, so I will have to
> look into this further.
They really don't support a varchar longer that 2K?
That's potentially a serious problem. Even if things like comments
never exceed 2K, the VARCHAR gets used in the text artifact representation,
and a 2K limit there could be a real problem.
Internally, Stellation does it's form of diff computation using a
longest common subsequence (LCS) algorithm. LCS is a sparse
dynamic programming problem. The naive implementation ends up
being O(n*m) space, where n is the number of lines in the original
document, and m in the number of lines in the modified doc.
We use a variant algorithm where the space usage is, roughly,
O(n * r), where r is a recurrence factor representing the
number of places where a line from the delta could
match a line from the original. In source files, r gets to be very
high, because you have many recurring lines like indented close
braces. It's possible for r to end up exceeding m.
But text files, and particularly source files, have an interesting property
that high recurrence lines are usually the least useful lines to consider
in generating a useful delta.
So we take advantage of that, and clump high recurrence lines together
in ways that reduce r, without effecting the overall quality of the
delta. 2K for maximum text size means that the clumping we can do
is fairly limited: we've seen clumping create chunks of 16K. And limiting
clumping can dramatically increase the memory usage of delta computation.
> 2) Some of the column names that are specified in the
> DBAccessPoint/ArtifactAgents are Oracle reserved words:
>
> a) [DBAccessPoint] Comments table: the column 'comment' should be
> renamed for the statement to work in Oracle.
> b) [DataArtifactAgent] the 'size' column should be renamed for the
> statement to work in Oracle.
That's definitely a problem. I think your suggestion for prefixing
is likely a good one. Could you put it into bugzilla?
> 3) I haven't run anything but configure database. I consider this a
> first milestone for the Oracle backend. In OracleAccessPoint, one will
> notice that it uses the Thin Oracle JDBC driver hardcoded to my own
> machine (p_ringo). I want Stellation to run first before I will delve
> deeper into DB connection configuration.
Very cool.
> 4) Concerning column and table names: should we prefix the names to
> prevent conflicts with *any* RDBM system?
As I said above, probably yes.
> Oracle backend code attached as patch.
> Please check in this code so I can continue working from CVS HEAD.
I'll look it over, and see about getting it checked in later today. Thanks
for the patch!
-Mark
--
Mark Craig Chu-Carroll, IBM T.J. Watson Research Center
*** The Stellation project: Advanced SCM for Collaboration
*** http://www.eclipse.org/stellation
*** Work Email: mcc@xxxxxxxxxxxxxx ------- Personal Email: markcc@xxxxxxxxxxx