Merging repositories with JGit + altering commit messages [message #1828570] |
Fri, 12 June 2020 14:56  |
Eclipse User |
|
|
|
Hello all,
our team wants to merge several repositories (each with 500-5000 commits) into one. These repos are hosted in a private GitBucket instance. GitBucket has an issue tracking system similar to GitHub, and almost all of our commit messages contain one or more issue references.
We would like to migrate all previous issues from the old repos to the new (merged one). This means of course that the issue numbers will be changed, and therefore all commit messages must be also upgraded.
I was thinking whether it is possible to write an application using JGit to combine the repositories together an build up a map of old and new issue numbers.
For example, consider the following repos:
Repo 1: Repo 2:
#5 (HEAD) #5 (HEAD)
| |
Merge Merge
| | | |
#3 #4 #4 #3
| | | |
#3 | | #3
| | | |
#2 #2
| |
#1 #1
The desired result would be:
Merged Repo:
Merge (HEAD)
| \
| ---------
| \
#5 #10
| |
Merge Merge
| | | |
#3 #4 #9 #8
| | | |
#3 | | #8
| | | |
#2 #7
| |
#1 #6
Plus the issue number mapping for the old Repo 2:
1 -> 6
2 -> 7
3 -> 8
4 -> 9
5 -> 10
So the main concerns are to preserve the all the history (including branch splits and merges) and the connected issues.
Is this possible using the JGit API, either high or low-level? The points I'm not sure about:
- is it possible to walk through the commits of a repo, and apply them (like cherry pick) to a different repo (and of course altering the commit message)?
- is it possible to preserve branch splits and merges? Obviously replaying all the merges woulld not be possible because that would mean resolving all conflicts ever encountered once more, i guess?
Thanks for any pointers in advance.
|
|
|
Re: Merging repositories with JGit + altering commit messages [message #1828665 is a reply to message #1828570] |
Tue, 16 June 2020 08:54   |
Eclipse User |
|
|
|
In order to do something like that you first need to transfer all commits from both repositories into one repository.
You can clone one of them and then fetch the second one into the same clone mapping branches of the
second repository to a different set of branch names in the clone to avoid clashes between branch names of the two repositories.
E.g. probably both repositories have a master branch, so you need them differently in the combined clone since a single branch can
only refer to a single commit.
Or you init an empty repository and then fetch commits from both repositories into this empty repository
mkdir mergedrepo
cd mergedrepo
git init
git fetch https://repo1 refs/heads/*:refs/heads/repo1/*
git fetch https://repo1 refs/tags/*:refs/tags/repo1/*
if there are more custom refs e.g. pull request refs fetch them similarily
git fetch https://repo2 refs/heads/*:refs/heads/repo2/*
git fetch https://repo2 refs/tags/*:refs/tags/repo2/*
This will yield two disconnected commit graphs in this repository "mergedrepo".
Then you can merge the two disconnected commit graphs, e.g.
git checkout repo1/master
git merge repo2/master
resolve conflicts, etc.
For rewriting commit messages to change issue links while preserving the topology of the commit graph
I think your best bet is using git-filter-repo [1] which is recommended by the old git filter-branch [1].
It's implemented in python 3 and has many advantages over the old filter-branch command.
It also foresees that you may need to extend it to write your own tool leveraging filter-repo.
Implementing something like that using jgit is surely possible but likely more work since there are probably a couple
of missing features you may need to implement on your own for such surgery.
[1] https://github.com/newren/git-filter-repo/
[2] https://git-scm.com/docs/git-filter-branch
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.03144 seconds