Automate your commit messages

Clever scripting to the rescue!
The git logo with the title of the article superimposed next to it.

Aye, aye, cap'n!

Git has a lot of inbuilt functionality that you might not use all the time, but which is oh-so-handy when you need it. One such thing is hooks. These can be used to do stuff like formatting your code before committing or running tests before pushing to the remote. Today, though, let's look at how we can use it to automatically fill in part of your commit messsage.

How did this come up?

The team I'm on at work uses a system for commit messages and branch names where they should both start with the issue id of whatever task we're working on. Using GitLab, they end up looking something like #2 Adds sorting functionality to UI (we can argue about commit message tenses later) and #42/new-sorting-algorithm, respectively.

If you're like me, your mind should be poking you right about now, saying that that's a lot of duplication; surely we don't need all of that? Well, you could argue that with feature branches scoped to issues and no fast-forward merges, the branch name should be enough, but it doesn't really show up in condensed commit logs etc., so it's nice to have the issue number available for when you need it.

But not to worry; we can make the computer fill it in for us!

Picking the right hook

Git has quite a few hooks available; if you check the .git/hooks directory of a tracked project, you should find a bunch of files named <hook name>.sample. Most of the hook names are fairly descriptive, but you could always look them up if you want to know more.

In our case we're interested in hooking into the system right before the commit message buffer appears on our screen, so we're going to go with the prepare-commit-msg hook.

Auto-filling the message

Now, my team has a template we use for commits, which starts with #[task number]. Using this knowledge, we can do some fancy shell magic and replace it with the task from the branch name.

The whole script looks like this:

    #!/bin/sh

    TASK_NO=$(git rev-parse --abbrev-ref HEAD | cut -d '/' -f 1)
    # BSD sed (macOS)
    sed -i'' -e "s/#\[task number\]/$TASK_NO/" "$1"

    # GNU sed (Linux)
    # sed -i -e "s/#\[task number\]/$TASK_NO/" "$1"
      Depending on whether you use BSD sed  (macOS) or GNU sed (Linux), the ~sed~ command will behave slightly differently. For GNU sed, use the commented out line instead of the first one.

      There is a slight difference in how you make the two versions of the programs write files in place.

The git rev-parse --abbrev-ref HEAD command gives you the name of the current branch, which we then split at the first / character using cut.

The next (and last) step is a simple sed replacement, replacing the #[task number] string with the branch's task number.

The $1 variable at the very end of the last line is the name of the commit message file and is something the script receives automatically. We use this to tell sed what file to replace the text in.

And for the curious, these are all the variables you get access to in the pre-commit-msg hook (taken from the pre-commit-msg.sample file):

    COMMIT_MSG_FILE=$1
    COMMIT_SOURCE=$2
    SHA1=$3

Caveats

There is a pretty significant exception to when this script does what you want it to: when you're still on the master branch, where commit messages will start with the text master. I considered adding a special case for this, but then realized that it's actually quite helpful in that it helps me remember when I should branch off, so I decided to keep it for now.

Another drawback is that because the commit message appears different from what git expects to find on disk (presumably), it will assume you put an actual message in and complete the commit. At least that's what I found when using Vim. A simple workaround is deleting everything in the buffer (dae if you use the awesome text-obj-entire plugin) before exiting. Most of the time, though, I use Magit with Emacs, where canceling the commit works as expected.

Alternate approach

While I have opted for modifying the commit message before it's shown to the user, you could potentially also have the machine add the id automatically in the pre-commit hook.

You would make the system check the message for a pattern and amend the commit with the expected issue id if it's not present.

This solution doesn't give you any way to verify whether it's correct or not, though, so you'd probably want some sort of way to bypass it.


The git logo with the title of the article superimposed next to it.

It's the little things ...

You prepare to commit your files, having made sure that all the right changes are in, that they belong to one logical change, and that you have the id of the task that they belong to. Your editor opens up and you see those familiar commit message instructions, but wait! You suddenly remember that you should start your commit with the id of the task prepended with a # character, but that would lead git to think that that line is a comment.

How do we solve this?

Easy! We simply tell git not to use the # character for comments, but pick a different character instead.

In short

There's two ways to do it, both producing the same result. You can either run a command from the command line or edit your git config directly. In this case, imagine we want to use a semicolon (;).

Command line:

    git config --local core.commentChar ';'

Directly modifying your git config file:

    [core]
      commentChar = ";"

Motivation

So why might you want to do this? As mentioned in the introduction, maybe your team has a standard format for commit messages, where each commit should lead with the id of the task that the commit relates to.

We use GitLab at work, where if you put a number after a # in your commit message, it will create a link to the issue with that id, which also adds an entry to the history of said issue. In addition to this, if all commits are prefaced with the task id, it becomes very easy to find out what task a commit relates to when looking at the history.

What might go wrong?

This is such a simple modification that there's not much you need to be aware of, but if you try and use a comment character consisting of more than one character, git will fail when you try and invoke a command:

    error: core.commentChar should only be one character
    fatal: bad config variable 'core.commentchar' in file '.git/config' at line 6

Finishing thoughts

It's a simple thing, but it's one of those things that improves your life just that little bit. You can even combine it with the custom config we mentioned last time to have this apply to all repos in your work directory, new and old.

Now as a final thought, what should you pick as a comment character? It's really up to what you need for your specific use case, but I find that ; is very rarely something that you'll want to start your sentences with, so that's my go-to.


The git logo with the title of the article superimposed next to it.

Small secrets, big wins.

Do you track your git config across machines or keep it in sync with your dotfiles? Is it full of settings that you want to share between all your different contexts? Are there certain things that you want to change depending on what the context is? Well, I've got news for you!

tl;dr:

If you want to use an additional config file for all subdirectories below a certain path, use the includeIf functionality that was introduced in git 2.13.0.

    [includeIf "gitdir:<path to top directory to use the config in>/"]
    path = <path to extra config>

That's all you need to get going, but if you want to know more about how this works and some caveats you might to be aware of, keep reading.

include and includeIf

First off, I'd be remiss not to direct you to the official documentation for this feature, which can be found here. However, there's loads more info there than you might need (and want), so let's extract the important bits.

The include and includeIf sections work the same, except the includeIf has a condition that must be satisfied. They let you insert configuration from another file into your main one.

In my case, this meant that I could automatically change my email whenever the git directory was a subdirectory of ~/projects/work, allowing me a simple and efficient way to always use my work email for work, without changing my global git config or branching off from my main dotfiles repo.

Here's the relevant extract from my config:

  [includeIf "gitdir:~/projects/work/"]
  path = ~/projects/work/.gitconfig

I keep a separate config file in the ~/projects/work directory that overrides my email.

This would also be useful if your team has specific rules about whitespace, merge strategies, comment characters, etc. that you want to enforce.

Things that might trip you up

Syntax
Note the trailing slash after the directory path in the includeIf line. This makes it so that it will match all subdirectories of the specified directory. As explained by the docs:
If the pattern ends with ~/~, ~**~ will be automatically added. For example, the pattern ~foo/~ becomes ~foo/**~. In other words, it matches "foo" and everything inside, recursively.

This also means that if there is no trailing slash, it'll match only that specific directory.

File insertion
Here's another quote from the documentation:
The contents of the included file are *inserted immediately, as if they had been found at the location of the include directive*. If the value of the variable is a relative path, the path is considered to be relative to the configuration file in which the include directive was found.

So you'll probably want to put the includes at the bottom of your files to make sure the included config isn't overridden later on in the source file.


I hope that cleared some things up for you and that you found it useful; I know I did. If you want more in-depth information, see the documentation.