[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [egit-dev] How a jgit merge could work - examine StrategySimpleTwoWayInCore
|
Sorry for the lag...
"Halstrick, Christian" <christian.halstrick@xxxxxxx> wrote:
>
> It would be great to get some comments of the jgit-experts (Hi Shawn, Robin)
> whether I got the idea of the merge. Should I continue posting strange pseudo
> code explaining jgit internals or is there no interest?
>
> # this merge will merge two(?) commits.
Yes, two. Though technically the generic Merger API supports N
commits (to do an octopus merge) any sane N-way merge strategy is
going to perform the merge as a successive set of pair-wise merges,
probably by calling this particular Merger and using its result as
input to the next merge in the set. So this class should work with
exactly 2 commits.
> # It will already do all the work
> # unless we have conflicts in the file content - here the content diff/merge
> # is missing.
Correct.
> # E.g. if a file was only touched in one of the commits it will
> # be there in the merge result. Files untouched in both commits are of course
> # also in the merge result
Correct.
> StrategySimpleTwoWayInCore InCoreMerger {
> ...
> Treewalk walk;
> ...
> merge(AnyObjectID[] tips)
> RevObject[] sourceObjects = walk.parseEntry(t) foreach t in tips
> RevCommit[] sourcecommits = walk.parseEntry(o) foreach o in sourceObjects
> RevCommit[] sourceTrees = walk.parseTree(o) foreach o in sourceObjects
> return mergeImpl()
At first glance these loops suck since we can only handle 2 trees
and a common base in this particular implementation. But we want to
leave it generalized for an octopus strategy and this is inherited
down from a common base class.
> mergeImpl()
> # get a special TreeWalk which ensures that can detect file/folder
> # conflicts. E.g. commit A has added file d/alpha and commit B has
> # added folder d/alpha/
> tw = new NameConflictTreeWalk()
Yes. This is an area where JGit is ahead of C Git in terms of
correctness. C Git still struggles with D/F conflicts.
JGit takes a performance hit here by using this class over TreeWalk,
but its a fairly small one and that penalty should really only
trigger when there is some ambiguity in the stream of names.
Most projects don't have D/F ambiguity when stored in Git, which
is why its take so many years for the C Git folks to identify D/F
problems and try to solve them.
> tw.add(mergeBase(0,1), sourceTrees)
> hasConflict = false
> builder = cache.builder()
> foreach t in tw
> if t.OURS.mode == t.THEIRS.mode && t.equal(OURS, THEIRS)
> # ours & theirs are the same -> choose one
> builder.add(OURS, Stage0)
> continue
> elseif t.BASE.mode == t.OURS.mode && t.equal(BASE, OURS)
> # ours was not changed -> take theirs
> builder.add(THEIRS, Stage0)
> elseif t.BASE.mode == t.THEIRS.mode && t.equal(BASE, THEIRS)
> # theirs was not changed -> take ours
> builder.add(OURS, Stage0)
> elseif t.isSubtree()
> # at least for one of the trees we are processing a
> # folder. If for any other tree we are processing a
> # file then this is a file/directory conflict
> if nontree(BASE)
> hasConflict = true
> builder.add(BASE, Stage1)
> if nontree(OURS)
> hasConflict = true
> builder.add(OURS, Stage2)
> if nontree(THEIRS)
> hasConflict = true
> builder.add(THEIRS, Stage3)
> tw.entersubtree() # prepares tw to return childrens
> # in the next iteration
> else
> # here a file-content-based merge algorithm could start
> builder.add( (BASE, Stage1), (OURS, Stage2), (THEIRS, Stage3) )
These remarks are all correct.
Note that in the final else clause where the file-content-based
merge starts you might be here because:
- conflicting mode change (e.g. both added the same file content,
but OURS made it 644 and THEIRS made it 755)
- add/add conflict (both added different content)
- remove/modify conflict (e.g. OURS modified, THEIRS deleted)
note this might be a rename (due to the remove).
- type change (OURS changed content of existing symlink, THEIRS
changed symlink to real file)
- modify/modify conflict (both changed content)
C Git triggers a file content merge on add/add and modify/modify.
It also tries to fix a remove/modify to be a rename/modify so it
can fall into the modify/modify case.
> if (hasConflict)
> return false
A major bug in the Merger API is that we don't expose the conflicting
file paths to the caller.
Gerrit Code Review really wants these so I can give a list of the
conflicting files to the end-user when I send out a notification
letting them know their code can't be submitted.
Other applications, e.g. EGit, might want these so they can display
a merge resolution UI with the conflicts and allow the user to
provide resolution data to us.
--
Shawn.