Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[egit-dev] Re: JGit merge algorithm (fwd)

Hi all,

Christian asked that I make my response to his wonderful merge work more 
public, so here we go:

---------- Forwarded message ----------
Date: Thu, 3 Dec 2009 18:37:59 +0100 (CET)
From: Johannes Schindelin <Johannes.Schindelin@xxxxxx>
To: "Halstrick, Christian" <christian.halstrick@xxxxxxx>
Cc: Shawn O. Pearce <spearce@xxxxxxxxxxx>
Subject: Re: JGit merge algorithm


On Thu, 3 Dec 2009, Halstrick, Christian wrote:

> how about reviewing merge algorithms? Before I work too long in the dark 
> on JGits merge algorithm I now proposed the first version of it to our 
> gerrit (see,123) . It has nearly no 
> special capabilities like merge levels, correct whitespace handling and 
> it will perform suboptimal. But before I need the experts that I am 
> heading in the right direction.
> Hey Jo, when leaving Berlin after the GitTogether you convinced me that 
> merging files is an easy thing when you have the diffs in hand. After 
> investigating xmerge.c and looking at ZEALOUS-levels and all that 
> special whitspace handling I vote that this is the understatement of the 
> month :-)


Actually, there are two reasons why I said it was easy:

1) you would probably have asked me to do it if I would have said that it 
   is difficult ;-)

2) The special whitespace handling is a big PITA.  Without it, the 
   algorithm is actually pretty easy, and I think that we _can_ factor out 
   the whitespace handling by overloading RawText with an appropriate 
   version providing appropriate equality tests.

A comment about the ExpandableContent: I would try to stay away from the 
notion that the content is an addressable byte array, but offload the job 
to output lines to the Sequence instance (that means adding methods to 
the interface, something like public void write(OutputStream out, int 
idx); and public void write(OutputStream out, int beginIdx, int endIdx);).

Maybe it would be better to name it MergeResult, too?

Then your class could actually refer back to the original 
sequences by something like this:

	class Chunk {
		int beginBase, beginBase;
		enum { OURS, THEIRS } source;
		int begin, end;

	class Conflict extends Chunk {
		int beginOurs, endOurs, beginTheirs, endTheirs;

	List<Chunk> chunks;
	Sequence ours, theirs;

It could have a method that returns a Sequence with the conflict markers:

	class MergedSequence implements Sequence {
		Sequence ours, theirs;
		IntList lines; // contains line numbers, either in ours,
				   -1 for "<<<", -2 for "===", -3 for ">>>"
				   or -4 - line number in theirs

		public MergeResult(IntLines lines,
				Sequence ours, Sequence theirs) {
			this.lines = lines;
			this.ours = ours;
			this.theirs = theirs;

		public int size() { return original.size(); }

		public boolean equals(int thisIdx,
				Sequence other, int otherIdx) {
			throw new UnsupportedOperationException();

		public void write(OutputStream out, int idx) {
			int origIdx = lines.get[idx];
			if (origIdx >= 0)
				ours.write(out, origIdx);
			else if (origIdx <= -4)
				theirs.write(out, -4 - origIdx);
			else if (origIdx == -1)
			else if (origIdx == -2)
			else if (origIdx == -3)

	public Sequence getMergedSequence() {
		IntList lines;
		for (Chunk chunk : chunks)
			if (chunk instanceof Conflict) {
				Conflict conflict = (Conflict)chunk;
				for (int i = conflict.beginOurs;
						i < conflict.endOurs; i++)
				for (int i = conflict.beginTheirs;
						i < conflict.endTheirs; i++)
					lines.add(-4 - i);
			else if (chunk.source == OURS)
				for (int i = chunk.begin; i < chunk.end; i++)
				for (int i = chunk.begin; i < chunk.end; i++)
					lines.add(-4 - i);

		// construct MergedSequence from lines, ours and theirs
		return new MergedSequence(lines, ours, theirs);

Now on to the meat:

Rather than using a stopEdit, I'd do something like this:

	if (!baseToOurs.hasNext())
		return new MergeResult(theirs);
	if (!baseToTheirs.hasNext())
		return new MergeResult(ours);

The real problem left is then only to split the edits so far that only 
non-conflicting and conflicting parts remain.

The rest of the merge() function looks quite straight-forward, except that 
I would prefer a name such as "current" to "actBase".  I'm afraid with my 
idea to reuse the "ours" and "theirs" sequences, the cases -1 and 1 cannot 
be consolidated into a common function, but for readability, I would 
definitely do that with case 0.

Oh, and I suggest using beginOurs, endOurs, beginTheirs and endTheirs 
instead of creating Edit instances whenever widening or melting edits.  
Thereby you can also avoid the need for nextOursEdit and nextTheirsEdit.

BTW the zealous mode, which is quite useful IMHO, is relatively easy to 
implement at the end of case 0, before adding the hunk:

	/* avoid showing identically added lines */
	while (beginOurs < endOurs && beginTheirs < endTheirs &&
			ours.equals(beginOurs, theirs, beginTheirs)) {

	while (beginOurs < endOurs && beginTheirs < endTheirs &&
			ours.equals(endOurs - 1, theirs, endTheirs - 1)) {

Unfortunately, for this to work, the common lines have to be output 
_after_ this adjustment.

Other than that: great work!


Back to the top