Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] Squashing before merging

We've been using Github squash and merge for a little while now and I must admit I still don't like the downsides described below. I am also conscious of the fact that in keeping our develop and master branch in sync, it is all too easy to make a mistake and use squash and merge. More generally: cleaning up a branch's commit history should be the responsibility of the branch author, not of whoever happens to do the merge.

I propose that we stop using squash and merge (I'll physically disable it), and instead each PR author makes sure to squash their feature branch locally where appropriate before merging the PR, through either a fast-forward (using github rebase) or (if that is not possible) a normal merge commit. It's a little more work, but it's a decent compromise between keeping the history clean/readable and still having a good workflow as well.

If we adopt this, I'll write a little blurb in the contributor guide about how to do squashing so we can point contributors to this.

A related request: before merging a PR, please make sure that both the PR title and each commit message start with the issue number: "GH-1234 my feature". The nice thing about this is that when browsing the code in Github, the issue number in the commit message clickable, which is incredibly useful when digging through when/why a particular change was made.

Squashing gives you the option not only to create fewer commits on your branch, but also to rewrite your commit messages - so I'm perfectly fine with "wip" as a commit message while you're working on something, as long as you clean it up when it's ready to be merged.

Let me know if you have any questions or concerns. If I don't hear anything, I'll assume everyone is ok with this and will disable squash and merge on the repo.

Cheers,

Jeen


On Wed, Oct 16, 2019 at 8:54 AM Jeen Broekstra <jeen.broekstra@xxxxxxxxx> wrote:
I've just discovered a bit of a downside to using Github's 'squash and merge': it loses the connection between your local feature branch and the upstream commit history. For example, I just merged https://github.com/eclipse/rdf4j/pull/1612 using 'squash and merge'. On Github itself this is nicely handled: the branch commits are squashed into a single commit, and that commit is then on the master branch as a single, sequential commit. Upstream feature branch is closed, all done.

However, my local downstream repo is now confused. Locally, I of course still have the feature branch. I normally clean these up when I'm done, using something like git branch -d <branch> . This time, however, that gave me a warning that the branch contained unmerged changes. I looked, and my local master branch is up to date, with the new commit that is the result of the PR merge. But because this is not a "proper" merge commit, my downstream repo doesn't recognize it as being the result of merging my feature branch, so it reports it as unmerged.

I can of course just ignore this, and use git branch -D <branch> to force-remove the local feature branch. And if I want to make sure I haven't missed something before doing that, that's easy enough: run a git merge <branch> --no-commit on master and if the diff is empty, you know you haven't missed something. But I thought it was worth pointing this out. Any thoughts on how big of a blocker this is in adopting 'squash and merge' ?

Just to be clear btw: this is specifically about using Github's "squash and merge" option as our default PR merge strategy, it's not about the general notion of squashing commits on a feature branch before merging a PR, which I think is a thing we should definitely do as long as we keep using 'normal' PR merge commits.

Jeen

On Tue, Oct 15, 2019 at 10:47 AM Jeen Broekstra <jeen.broekstra@xxxxxxxxx> wrote:

Fast-forwarding is almost never an option for us as we so often have multiple parallel feature branches on the go, each of which may get merged into master at any point. And I don't want to advocate rebasing feature branches, because it makes too many changes to the commit history that I'm uncomfortable with, and, if used inexpertly, can mess with proper commit authorship and attribution (not to mention cause major conflicts and even loss of work when branches are shared between authors). 
 
That being said, I'm giving some serious thought to looking at Github's "Squash and merge" technique more closely, since it gives us most of the benefits (clean, linear history) without the drawbacks of having to rebase (other than the squashing itself of course, which is a form of rebasing).

With this technique, when a PR gets merged, the commit history just shows a single commit on the target branch, that is a squashed version of the entire PR. This commit is not a real merge commit, so from the point of view of the git history it looks like a single, sequential commit on the target branch (see for example https://github.com/eclipse/rdf4j/commit/b8797d5e14af1d6a41063716f9f7875f3a308622 which is a PR by Havard that had two commits originally, which I merged to master using squash and merge). I was worried about authorship and signing, but it looks as if Github is smart enough to keep the original author on the commit (I'm the committer, but Havard is still the author). It also automatically adds the original commit messages (including sign-offs) on the squashed commit message. 

Jeen

On Mon, Oct 14, 2019 at 6:12 PM Andreas Schwarte <aschwarte10@xxxxxxxxx> wrote:
Hi,

I very much like this proposal and would definitely vote for applying PR cleanup as much as possible.


Internally in our company we also do this (potentially as a last step before a PR gets merged) to have a clean history. We mostly even try to go one step further and try to avoid merge commits entirely (i.e. doing fast forward merges wherever possible). In an ideal world this leads to an easy to follow linear history of changes. However, this might go to far for the RDF4J project.

Best,
 Andreas

Am So., 13. Okt. 2019 um 02:20 Uhr schrieb Jeen Broekstra <jeen.broekstra@xxxxxxxxx>:
How do people feel about making squashing before merging a PR a routine part of our dev workflow?

Reason this came up is that I am trawling through the git history to see if I can cherry-pick things for the 2.5.5 backport, and I have noticed that our history log (even with the help of a graphical UI like Gitkraken) is rather hard to read: we have numerous feature branches that contain dozens of tiny commits, often small things like editorial fixes, typos, etc. I'm very guilty of this myself by the way.

If we could manually squash our PRs into just one or a few meaningful commits before merging, it would make our history a lot easier to read.

As an aside: Github also has a "Squash and merge" option as the merge strategy choice for a PR. While this would be super-convenient and take a lot of the hassle out of squashing, I'm a bit wary of it because of various reports that it doesn't always properly preserve authorship and sign-offs (both of which are crucial for us to audit and validate IP).

So I suggest that for now, we stick to doing this the manual way, perhaps helping casual contributors by doing the squash in their place before hitting merge on a PR.

Cheers,

Jeen
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/rdf4j-dev
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/rdf4j-dev

Back to the top