One more time, my bro Hugo, helped me by translating my post “Commits atomiques - la bonne approche”. Many thanks to him for his translation and review!
One of the most difficult parts of our job is to divide our work into small units. Often overlooked, this challenge happens as much on an architectural level (it’s much easier to build a big-ass monolith than small joint projects) as on the code we ship every day through our commits.
In other words: it is hard to author small commits.
It is hard to author small commits because in a way it is easy not to commit. While this may sound stupid, it is usually easier, at the end of a long coding session, to ship a big huge commit rather than doing a lot of small ones. It’s even more tempting to do so when the benefits of small commits are not necessarily obvious.
It’s a bit like automated tests: we tend to find them useless when writing them. After all, proof-reading the code and going through a couple of reviews should be enough to make sure not to introduce a bug. However, the day there is a live bug because of a small thing that should have been Just Fine™, a proper test coverage would have made sure to catch that nasty problem before it hits the production server.
Small commits are somehow similar: they are a pain in the ass to do, they seem useless, but the one time shit gets real on the live server, we thank ourselves having made the efforts.
You might wonder what is the connection between the average commit size and a bug in production: the harder it is to find the source of bug and the way to patch it, the more expensive it will be to fix (be it in time or money). Therefore reducing the cost of a bug comes down to spotting it early (admittedly, before hitting live is the ideal scenario). This is where tests intervene. But it also comes down to reducing the amount of time needed by an engineer to fix the problem.
Hunting bugs down
Authoring small commits help finding the origin of a bug, the context in which it has been introduced and possibly the reason why it has been introduced. Therefore, it makes life easier for the person in charge of fixing it.
This is getting even faster with Git when performing dichotomous search in a large range of commits with
git bisect. For instance, on a huge project with over 31,000 commits (spread across 6 years), a good ol’
git bisect could make it possible to find the origin of a bug in about 15 operations.
If you’d like to learn more about tracking down bugs with
git bisect, be sure to read my articles about it in French and in English. :)
Other and more obvious benefit of small commits: they make reviewing code much easier. Without even talking about open source software here, performing code reviews before merging into the main repository present number of advantages.
- It reduces the amount of minor mistakes
- It enforces consistency across a code base
- It promotes knowledge sharing across a team
- It sparks discussions pushing the project further
Anyway, that’s awesome.
On the other hand, it is getting tedious to do consistently when code gets long, complex and dispatched.
When building a feature from start to finish —going through all the layers of an MVC application— a lot of files could be impacted: the view (usually split across templates and styles), the controller, the router, the application layer, the DAO… Plus unit tests for all or most of these chunks and possible integration tests… The pull-request for this feature is going to be huge. And that’s without counting for a bit of refactoring here and there, checkstyle, improvement of test coverage, clarification of documentation, and so on.
The risk is to end up with a huge beast that nobody will want to review. Best case scenario, the reviewers will be careful on the first few lines/files then quickly start botching. Worst case scenario, they will give up hoping someone with more time on their hands will tackle it (hint: nobody does).
Pull requests:— I Am Devloper (@iamdevloper) April 10, 2017
- 3 files have changed
> 25 comments in conversation
- 40 files have changed
I write, I write, but we still haven’t really tackled the technical content for this article. But I felt like I had to introduce the matter with a bit of background to explain the rational behind this whole “small commits” approach.
Alright, let’s roll up our sleeves and dig.
So, what’s up?
There is a ton of ways to organise commits with Git. The first one is not even a technical trick: stay focus on the goal and don’t perform any non-critical change anywhere else.
Let’s be real here, this is quite a terrible technique because it goes actively against improving the code. So unless you like covering your desk with post-its like “remove unnecessary white space line 17 of file Toto.class.php”, I suggest we find something else.
A good method should let the developer work the way they like only to finally reorganise their changes when committing.
A swarm of commits
The best piece of advice I can give you is to commit very often.
Reindenting some broken tabs? Commit it on its own. New tests for a function? Commit them on their own. Some necessary refactoring for your upcoming feature? Commit that on its own.
The thing that matters is that the critical commits for a feature, the ones which will need to be actively reviewed, are free from all this visual distraction.
The tricky part with this approach is to resist the temptation of doing
git add -A to ship all modified files in the commit. This is where knowing how to use
git add -p (or
--patch) helps, so we can divide multiple updates within a single file to commit them separately.
Let’s say we have the following PHP file:
We’re just about to add some code to this file when we realise that the
$listId parameter is not typed in the PHPDoc from
retrieveList(). So we add the type. It’s easy, it doesn’t cost much and it will help using this method in the IDE. Cool.
As thorough Atomic Commiters (here is your new Twitter bio, you’re welcome), we’ll want to commit this small addition on its own, without including the new (and possibly non-finished) code. When using
git add -p path/to/MyFile.php, Git will divide changes into blocks, making it possible for us to choose for each block whether we want to add it to the commit.
? displays all possible options:
Most of the time, we press
y to add the block or
n not to. Sometimes
s to ask git to be even more granular (until it can’t do anymore) and
e to manually choose which lines should be added.
In this scenario, we press
y to add the PHPDoc comment to the commit. Then git will present the next modification in the file:
Now we say
n since we only want to commit the documentation fix, not the rest. Once we’ve been going through all the modifications (probably setting
n for all the others), it’s only a matter of finalising the commit and pushing to the remote repository.
The benefit of this approach is that these little changes will likely be quickly reviewed and merged while working on the core feature.
Bundling the fancy edits
A variation of the aforementioned technique is to bundle these small changes in relatively concise commits rather than having them in tons of teeny tiny ones. Performing atomic commits does not mean each PHPDoc change has to live in its own commit; they can be bundled together as long as they are connected by a unique theme.
The possible issue with this comes up in code review. Even with 10 properly cut commits, the diff is likely to be quite important. This is where it might be helpful to tell reviewers to check commits individually rather than as a whole.
Other potential downside with committing everything at the end of a long working session: it can be time consuming. We might start a commit dedicated to a small thing (such as moving a repeated string to a constant), making us browse through the entire diff to make sure all changes related to this are added to the commit. Once done, we’ll walk through the whole diff to gather all related changes into another commit. And so on, and so on, making us go through the diff quite a few times.
Rockin’ the interactive rebase
Another approach that has proven to be successful is abusing interactive rebase. This git feature makes it possible to reorder commits, edit them and even merge them.
It means we can perform a lot of small commits that are not directly related to the feature we are currently working on. Once ready to submit the final pull-request for review, we can reorder and squash small commits together to reorganise the git history. For instance, we could create a first pull-request containing all little edits, and then another for the actual feature.
Let’s illustrate with an example.
We are currently working on story GRE-1234, whose goal is to allow users to edit their messages in a forum thread. While working on it, we fix a lot of small things here and there, and commit them on the fly. Once done with the actual feature, we end up with these 15 commits:
Some of these commits are very connected and could be grouped. Our feature is really just done through 2 commits, prefixed with the story number (
[GRE-1234]). And unless we have a good reason to split this in half, they probably should be merged into a single commit.
Alright, let’s try to tidy this mess.
git rebase -i HEAD~15 allows us to update the history of these last 15 commits. It will open a text editor displaying the list of commits, each of them prefixed with
pick. Beware! Unlike
git log, commits are actually displayed in chronological order: oldest first, newest last.
We usually stick to options
squash, as well as the reordering feature. Let’s start with that:
We grouped together similar commits and moved our feature to the very end. Now we can decide to merge some commits together. To do so, we replace the
pick keyword from the relevant lines with
For instance, the first 4 commits are somehow similar: they add missing descriptions to behat scenarios. We can then keep only the first one and merge the 3 next into the first:
Let’s repeat this process for all the other “groups” of commits, until we get something like this:
If we save and close our editor, it will reopen asking for the new commit message for the first pack of 4 commits to be merged:
We can empty the whole thing and replace it with a single message “Missing description in Behat scenario”. When closing the editor, it will open again to ask for the message for the next commit group, and so on until the last group is done after which we’re back on the terminal.
If we wanted to skip this process and keep the message of the first commit of each squashed group, we could have used
f) instead of
squash during the interactive rebase. This is the only difference between them two:
squash merges commits while giving the ability to edit comments while
fixup uses the message of the receiving commit.
That’s it! If we have a look at our commit history, we now have 7 commits in place of 15:
At this point, we can safely create a pull-request and ask our co-workers to only actively review the commit containing the actual feature without worrying too much about previous unrelated changes.
We could also create a branch on the second to last commit (
git checkout -b small-fixes HEAD~1) to send all these rather boring edits in a pull-request to free the feature pull-request from visual noise.
Note that interactive rebase works well when commits are not particularely connected to each others. Always be careful with the history order. For instance, the second commit from our feature (
7a9d433 Don't allow answer edition on closed thread) is likely to be highly dependant on the first as it fixes an edge case omitted from the first one. Swapping the order in the rebase might be quite a bad idea.
As a safety measure, feel free to create a temporary branch (
git branch tmp-save for instance) before jumping into a rebase, so you can abort and come back to a sane spot in case it turns badly.
Another nice thing with interactive rebase is the autosquash feature. Rather than reorganising the code and merging commits at the end, we can do it on the fly during development. When creating a commit, we can tell in which commit it will eventually have to be merged.
Let’s go back to our 15 commits from earlier:
We start with a first commit (
ab00199 Missing description in Behat scenario), then a second (
7bc77ef Dead code). When authoring the third commit (
8593186 Missing description in Behat scenario), we could have thought “hey, it’s the same as the first commit, I should merge this change with it”.
To do so, we only have to use the
--fixup parameter from the
git commit command, precising in which commit we want to merge these changes. We can do that by specifying the commit SHA1:
git commit --fixup ab00199. Or in a relative way:
git commit --fixup HEAD~1 (second to last commit). Or even with a search for the most recent commit matching a keyword:
git commit --fixup :/Behat. How cool is that?
Once done, the 3 last commits should look like this:
Now let’s say we did so on our 15 commits, we should have:
We only have to perform an interactive rebase with the
--autosquash option so all commits tagged
fixup! gets merged in their relevant commit. git will open the editor with the following content:
We can save and close the editor so the rebase gets completed.
Note how the order is different from the one we had when reorganising the history without the
--autosquash option. If it causes any issue, it’s still possible to perform another interactive rebase to reorder commits (
git rebase -i HEAD~7).
Rather than commiting with
git commit --fixup [SHA1], we can also use
git commit --squash [SHA1] which will work the same way. The only different is the ability to edit messages at the end for commits created with
Wrapping things up
At the beginning of this huge wall of text, I told you about
git bisect to perform dichotomic search. Let’s remember that for this command to work properly, it needs a rather clean git history. In theory, you should be able to roll back to any commit and be able to use your application (including running tests).
Therefore, there should be no “Fixing broken test”, “Fixing error seen in CI” or “Fixing layout” commits: if a commit breaks something (be it a test or a part of the app), the fix for this problem should be merged in the same commit, and not be a commit on its own. Otherwise it will be possible to end up in a broken state while performing a
git bisect. Not cool.
Alright, I hope all this will make you reconsider next time you want to use
git add -A. Remember to do small commits to ease readability, code review and debugging if there is ever the need!