[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[egit-dev] Re: JGit merge algorithm (fwd)
|
Hi all,
Christian asked that I make my response to his wonderful merge work more
public, so here we go:
---------- Forwarded message ----------
Date: Thu, 3 Dec 2009 18:37:59 +0100 (CET)
From: Johannes Schindelin <Johannes.Schindelin@xxxxxx>
To: "Halstrick, Christian" <christian.halstrick@xxxxxxx>
Cc: Shawn O. Pearce <spearce@xxxxxxxxxxx>
Subject: Re: JGit merge algorithm
Hi,
On Thu, 3 Dec 2009, Halstrick, Christian wrote:
> how about reviewing merge algorithms? Before I work too long in the dark
> on JGits merge algorithm I now proposed the first version of it to our
> gerrit (see http://egit.eclipse.org/r/#change,123) . It has nearly no
> special capabilities like merge levels, correct whitespace handling and
> it will perform suboptimal. But before I need the experts that I am
> heading in the right direction.
>
> Hey Jo, when leaving Berlin after the GitTogether you convinced me that
> merging files is an easy thing when you have the diffs in hand. After
> investigating xmerge.c and looking at ZEALOUS-levels and all that
> special whitspace handling I vote that this is the understatement of the
> month :-)
Hahaha!
Actually, there are two reasons why I said it was easy:
1) you would probably have asked me to do it if I would have said that it
is difficult ;-)
2) The special whitespace handling is a big PITA. Without it, the
algorithm is actually pretty easy, and I think that we _can_ factor out
the whitespace handling by overloading RawText with an appropriate
version providing appropriate equality tests.
A comment about the ExpandableContent: I would try to stay away from the
notion that the content is an addressable byte array, but offload the job
to output lines to the Sequence instance (that means adding methods to
the interface, something like public void write(OutputStream out, int
idx); and public void write(OutputStream out, int beginIdx, int endIdx);).
Maybe it would be better to name it MergeResult, too?
Then your class could actually refer back to the original
sequences by something like this:
class Chunk {
int beginBase, beginBase;
enum { OURS, THEIRS } source;
int begin, end;
}
class Conflict extends Chunk {
int beginOurs, endOurs, beginTheirs, endTheirs;
}
List<Chunk> chunks;
Sequence ours, theirs;
It could have a method that returns a Sequence with the conflict markers:
class MergedSequence implements Sequence {
Sequence ours, theirs;
IntList lines; // contains line numbers, either in ours,
-1 for "<<<", -2 for "===", -3 for ">>>"
or -4 - line number in theirs
public MergeResult(IntLines lines,
Sequence ours, Sequence theirs) {
this.lines = lines;
this.ours = ours;
this.theirs = theirs;
}
public int size() { return original.size(); }
public boolean equals(int thisIdx,
Sequence other, int otherIdx) {
throw new UnsupportedOperationException();
}
public void write(OutputStream out, int idx) {
int origIdx = lines.get[idx];
if (origIdx >= 0)
ours.write(out, origIdx);
else if (origIdx <= -4)
theirs.write(out, -4 - origIdx);
else if (origIdx == -1)
out.println("<<<<<<<");
else if (origIdx == -2)
out.println("=======");
else if (origIdx == -3)
out.println(">>>>>>>");
}
public Sequence getMergedSequence() {
IntList lines;
for (Chunk chunk : chunks)
if (chunk instanceof Conflict) {
Conflict conflict = (Conflict)chunk;
lines.add(-1);
for (int i = conflict.beginOurs;
i < conflict.endOurs; i++)
lines.add(i);
lines.add(-2);
for (int i = conflict.beginTheirs;
i < conflict.endTheirs; i++)
lines.add(-4 - i);
lines.add(-3);
}
else if (chunk.source == OURS)
for (int i = chunk.begin; i < chunk.end; i++)
lines.add(i);
else
for (int i = chunk.begin; i < chunk.end; i++)
lines.add(-4 - i);
// construct MergedSequence from lines, ours and theirs
return new MergedSequence(lines, ours, theirs);
}
Now on to the meat:
Rather than using a stopEdit, I'd do something like this:
if (!baseToOurs.hasNext())
return new MergeResult(theirs);
if (!baseToTheirs.hasNext())
return new MergeResult(ours);
The real problem left is then only to split the edits so far that only
non-conflicting and conflicting parts remain.
The rest of the merge() function looks quite straight-forward, except that
I would prefer a name such as "current" to "actBase". I'm afraid with my
idea to reuse the "ours" and "theirs" sequences, the cases -1 and 1 cannot
be consolidated into a common function, but for readability, I would
definitely do that with case 0.
Oh, and I suggest using beginOurs, endOurs, beginTheirs and endTheirs
instead of creating Edit instances whenever widening or melting edits.
Thereby you can also avoid the need for nextOursEdit and nextTheirsEdit.
BTW the zealous mode, which is quite useful IMHO, is relatively easy to
implement at the end of case 0, before adding the hunk:
/* avoid showing identically added lines */
while (beginOurs < endOurs && beginTheirs < endTheirs &&
ours.equals(beginOurs, theirs, beginTheirs)) {
beginOurs++;
beginTheirs++;
}
while (beginOurs < endOurs && beginTheirs < endTheirs &&
ours.equals(endOurs - 1, theirs, endTheirs - 1)) {
endOurs--;
endTheirs--;
}
Unfortunately, for this to work, the common lines have to be output
_after_ this adjustment.
Other than that: great work!
Ciao,
Dscho