Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [egit-dev] How a jgit merge could work - examine StrategySimpleTwoWayInCore

Sorry for the lag...

"Halstrick, Christian" <christian.halstrick@xxxxxxx> wrote:
> It would be great to get some comments of the jgit-experts (Hi Shawn, Robin)
> whether I got the idea of the merge. Should I continue posting strange pseudo
> code explaining jgit internals or is there no interest?
> # this merge will merge two(?) commits.

Yes, two.  Though technically the generic Merger API supports N
commits (to do an octopus merge) any sane N-way merge strategy is
going to perform the merge as a successive set of pair-wise merges,
probably by calling this particular Merger and using its result as
input to the next merge in the set.  So this class should work with
exactly 2 commits.

> # It will already do all the work
> # unless we have conflicts in the file content - here the content diff/merge
> # is missing.


> # E.g. if a file was only touched in one of the commits it will
> # be there in the merge result. Files untouched in both commits are of course 
> # also in the merge result


> StrategySimpleTwoWayInCore InCoreMerger {
> ...
> Treewalk walk;
> ...
> merge(AnyObjectID[] tips)
> 	RevObject[] sourceObjects = walk.parseEntry(t) foreach t in tips
> 	RevCommit[] sourcecommits = walk.parseEntry(o) foreach o in sourceObjects 
> 	RevCommit[] sourceTrees = walk.parseTree(o) foreach o in sourceObjects
> 	return mergeImpl()

At first glance these loops suck since we can only handle 2 trees
and a common base in this particular implementation.  But we want to
leave it generalized for an octopus strategy and this is inherited
down from a common base class.

> mergeImpl()
> 	# get a special TreeWalk which ensures that can detect file/folder
> 	# conflicts. E.g. commit A has added file d/alpha and commit B has
> 	# added folder d/alpha/
> 	tw = new NameConflictTreeWalk()

Yes.  This is an area where JGit is ahead of C Git in terms of
correctness.  C Git still struggles with D/F conflicts.

JGit takes a performance hit here by using this class over TreeWalk,
but its a fairly small one and that penalty should really only
trigger when there is some ambiguity in the stream of names.
Most projects don't have D/F ambiguity when stored in Git, which
is why its take so many years for the C Git folks to identify D/F
problems and try to solve them.

> 	tw.add(mergeBase(0,1), sourceTrees)
> 	hasConflict = false
> 	builder = cache.builder()
> 	foreach t in tw
> 		if t.OURS.mode == t.THEIRS.mode && t.equal(OURS, THEIRS)
> 			# ours & theirs are the same -> choose one
> 			builder.add(OURS, Stage0)
> 			continue
> 		elseif t.BASE.mode == t.OURS.mode && t.equal(BASE, OURS)
> 			# ours was not changed -> take theirs
> 			builder.add(THEIRS, Stage0)
> 		elseif t.BASE.mode == t.THEIRS.mode && t.equal(BASE, THEIRS)
> 			# theirs was not changed -> take ours
> 			builder.add(OURS, Stage0)
> 		elseif t.isSubtree()
> 			# at least for one of the trees we are processing a
> 			# folder. If for any other tree we are processing a 
> 			# file then this is a file/directory conflict
> 			if nontree(BASE)
> 				hasConflict = true
> 				builder.add(BASE, Stage1)
> 			if nontree(OURS)
> 				hasConflict = true
> 				builder.add(OURS, Stage2)
> 			if nontree(THEIRS)
> 				hasConflict = true
> 				builder.add(THEIRS, Stage3)
> 			tw.entersubtree()	# prepares tw to return childrens
> 						# in the next iteration
> 		else
> 			# here a file-content-based merge algorithm could start
> 			builder.add( (BASE, Stage1), (OURS, Stage2), (THEIRS, Stage3) )

These remarks are all correct.

Note that in the final else clause where the file-content-based
merge starts you might be here because:

 - conflicting mode change (e.g. both added the same file content,
   but OURS made it 644 and THEIRS made it 755)

 - add/add conflict (both added different content)

 - remove/modify conflict (e.g. OURS modified, THEIRS deleted)
   note this might be a rename (due to the remove).

 - type change (OURS changed content of existing symlink, THEIRS
   changed symlink to real file)

 - modify/modify conflict (both changed content)

C Git triggers a file content merge on add/add and modify/modify.
It also tries to fix a remove/modify to be a rename/modify so it
can fall into the modify/modify case.

> 	if (hasConflict)
> 		return false

A major bug in the Merger API is that we don't expose the conflicting
file paths to the caller.

Gerrit Code Review really wants these so I can give a list of the
conflicting files to the end-user when I send out a notification
letting them know their code can't be submitted.

Other applications, e.g. EGit, might want these so they can display
a merge resolution UI with the conflicts and allow the user to
provide resolution data to us.


Back to the top