Hakyll logo + GitLab logo with Nix Snowflake in lower right corner

Let's make these tools play nice together!

As I've mentioned before, one of the big pain points relating to this blogging business is deployment time: Haskell is slow to compile and Hakyll has multiple large dependencies, so the builds would initially take up to an hour. Yeah, you read that right. 60 minutes 😱. Something goes wrong towards the end of the build? Sucks to be you.

Thanks to Saksham Sharma and their post on speeding up Haskell CI builds, however, I have been able to bring it down to 7-8 minutes in GitLab's CI/CD systems (excluding time spent waiting for runners to spin up etc.). That said, it wasn't quite as easy as I'd hoped it would be (when is it ever?): Due to how Stack and Nix interact, building of the site would crash when it ran into UTF-8-encoded characters. Not cool.

Let's fix it.

Step 1: using an image with Hakyll pre-built

In Sharma's post, they mention that they've created an image that you can use for your build systems. The simplest version would look a little something like this (freely updated from their minimal configuration example):

  image: sakshamsharma/docker-hakyll:v3

  pages:
    script:
      - stack build
      - stack exec site build
    # ... rest of stage omitted

An important thing to note is that your stack config's resolver must match the one used in the Docker image, otherwise the build system would have to recompile Hakyll and its dependencies for your version, taking us back to the hour-long builds.

For v3, the resolver is lts-12.21, so make sure your project's stack.yaml contains the following line:

  resolver: lts-12.21

If this works for you and is all you need: great! If it doesn't and you get errors talking about invalid byte sequences like the one below: don't panic. I'll sort you out.

  Compiling
    [ERROR] ./about.rst: hGetContents: invalid argument (invalid byte sequence)

Step 2: This one weird trick

As described in this GitHub issue, a fix for the above error is available in Stack's master branch and as of Stack v2.1---the release candidate for which was released while I was writing this post---will be included with the tool.

From the release notes for the release candidate: "Use en_US.UTF-8 locale by default in pure Nix mode so programs won't crash because of Unicode in their output".

So if you're using Stack v2.1 or later, the steps outlined in this section should not be necessary.

As evidenced by a fair few GitHub issues1, this is something that a number of users run into and it might be difficult to troubleshoot, but what it boils down to is this: When running Stack in Nix mode it defaults to building in pure mode. This isolates the build environment by removing environment variables and other things on your system that could influence the build and lead to a lack of reproducibility. This is usually a good thing, but it also unsets the LANG variable, which Stack relies on to know how it should handle encodings.

Ok. So all we gotta do is re-set that variable, then? Yes. But how to do that might not be immediately apparent. You might be used to running shell commands like this:

MY_VAR="my-value" ls -lah

But this won't work with Stack, because it'll still isolate the environment. What you can do, however, is to use the --no-nix-pure option. This tells Stack not to isolate the build environment, so you'll still be able to access external variables. Here's an extract from my current build file that does just that:

  image: sakshamsharma/docker-hakyll:v3
  script:
    - stack build
    - stack exec --no-nix-pure site build

This works perfectly on GitLab's CI runners, but if this still doesn't solve your issue, you might want to check what the locale is actually set to by using the locale shell command. The output should look something like this:

  $ locale
  LANG=en_US.UTF-8
  LC_CTYPE="en_US.UTF-8"
  LC_NUMERIC="en_US.UTF-8"
  LC_TIME="en_US.UTF-8"
  LC_COLLATE="en_US.UTF-8"
  LC_MONETARY="en_US.UTF-8"
  LC_MESSAGES="en_US.UTF-8"
  LC_PAPER="en_US.UTF-8"
  LC_NAME="en_US.UTF-8"
  LC_ADDRESS="en_US.UTF-8"
  LC_TELEPHONE="en_US.UTF-8"
  LC_MEASUREMENT="en_US.UTF-8"
  LC_IDENTIFICATION="en_US.UTF-8"
  LC_ALL=

If the output doesn't show a UTF-8 format, that seems like a good place to start (I'd try EXPORT LANG=en_US.UTF-8 before running the Stack commands), but now we're wading out past the scope of this post, so you're gonna have to go it on your own, I'm afraid. Sorry, kiddo.

Wrapping up

And that's it! Simple, but not immediately obvious. It's likely that a similar approach---the prepared Nix container---would work for other Haskell projects as well, though I can't say for certain one way or the other.


xargs and the unruly tags

A tale of two commands

I thought I was really clever when I configured my CI/CD pipeline to tag commits that got deployed and push the tags back into the repo, but I'm rarely as clever as I like to think: I had forgotten to put the proper checks in place to avoid these tag pushes triggering subsequent runs of the pipeline, and things got a little ... out of hand.

I'd gone to bed just after pushing an update, and when I arose to check on it, I found that the deploy tagging stage had been running over and over and over and over and ... you get the point. Thankfully, it had failed after about 130 rounds, so it could have been a lot worse, but I was left with a large amount of useless and unwanted tags in the remote repo.

So how do you fix something like this? Yup, xargs to the rescue!

Where there's a will ...

At first, I didn't really know how I'd go about it. I was hoping git would have some nice, built-in functionality for mass-deleting remote tags, but while I have found in retrospect that it does (see the postmortem), I couldn't find it at the time.

However, because all the tags were for a specific commit, I did know that I could list all the relevant tags separated by newlines, using git tag --contains <SHA>.

So, with some helpful advice from Stack Overflow and this guy, I constructed this little command which sorted me out just fine:

git tag --contains 9216e97ce7e66090f79eba4d1abe6548d72dd638 \
| xargs -I % git push origin :refs/tags/%

Now, I'd come across xargs before, even done the ol' copying and pasting from Stack Overflow trick, but it had always looked really complicated and no-one had ever told me why I'd need it or what it does; so I just carried on in blissful ignorance. Not this time, though. It was time to figure out what was going on.

Groking xargs

The way xargs was sold to me was: "execute a command for each item in a list". It's actually more powerful than that, but that's a great place to start.

Let's use the man page to find out what that -I % bit means :

~-I~
replace-str: "Replace occurrences in the initial-arguments with names read from standard input"

The string to use to indicate where to place arguments in the command to run. In the command above, we chose to use %, but you're not limited to this.

Similar to printf and format strings in general, this places your arguments at your desired place in the command. In our case it both limits us to using one argument (tag) at a time, and it lets us append it to :refs/tags/ without being separated by a space.

That means that in the above snippet, xargs would, for each tag listed, run the command git push origin :refs/tags/<tag_name>, which pushes that tag with an empty reference, thereby deleting it.

If all you want is to put the argument at the end of the command, you can even do without the -I. Say you want to recursively delete all the .swp files in a directory:

find -name "*.swp" | xargs rm

Be aware, though, that without either using a -I or -n (to limit the number of arguments to use for each command), xargs will split the list you give it into sizeable chunks and apply as many arguments to the command as it can each time. That means that in this case, it'd likely end up looking something like this:

rm a.swp b.swp c.swp ...

which is usually fine and what you want, but keep this in mind for when it isn't.

This is only scratching the surface of what xargs can do, but it's enough to make it do some pretty heavy lifting. It might not be something to reach for very often, but for when you do need it, it's a great tool to have in your belt.

Postmortem ⚰️

<<postmortem>> Now, you might have noticed that I did a git push for each tag that I was deleting, and you might be thinking that for over a hundred tags, it must have taken quite some time. You would be right. Luckily, I was working on something else, so I could happily let it run in the background. But we can do better!

xargs has an option -P or --max-procs, which you can use to decide how many processes to run in parallel. The default is 1, but if you set it to 0, it will run as many as it can. This could have saved us quite some time, assuming git would let us run multiple push operations from the same repo at the same time. But there is an even better way:

As outlined in this Stack Overflow response, you can use a whitespace-separated list of tag names (<tags>) with git push~; so we could have run ~git push --delete origin <tags> to achieve the same outcome as deleting them one by one.

If we rewrite the command from earlier, we can both simplify it and do it all in a single push:

git tag --contains 9216e97ce7e66090f79eba4d1abe6548d72dd638 \
| xargs git push --delete origin

... yeah, that would have been a lot more efficient 😅


Docking pains

What to do when the whale is too big
Moby Dock, the Docker logo: A blue whale carrying a eight containers on its back.

Big fish, big problems.

You know how some things are a lot more difficult than they seem? In an attempt to speed up deployments for this blog, I wanted to look into building a Docker image with Hakyll and all the required build dependencies available. To be able to do this effectively, I figured I'd need to have the ability to work with Docker locally. Turns out this was one of those things.

The goal
Enable Docker virtualisation and development on a NixOS system
Challenges
The root partition---which is where Docker stores data---keeps running out of space

In theory, it's simple:

  1. enable Docker
  2. configure it to store data somewhere that is not /var/lib/docker

In practice, it turns out to be a bit more difficult than expected, but don't worry: We'll figure it out together!

Enabling

According to the NixOS manual and the the Wiki article on Docker there's really not much to it: To enable Docker, all you need to do is update your configuration.nix to include

  virtualisation.docker.enable = true;

As pure and simple as Nix should be.

According to the manual: "/This option enables docker, a daemon that manages linux containers. Users in the "docker" group can interact with the daemon (e.g. to start or stop containers) using the docker command line tool./"

Read that last line carefully: "/Users in the "docker" group can interact with the daemon [...]/". Yup. That means we need to make sure our user is in the correct group:

  # replace thomas with the name of your user
  users.extraUsers.thomas = {
    extraGroups = [
      "docker"
      # ... other groups
    ];
  # ... remaining configuration
  };

What isn't immediately obvious is this: You must log out and back in before this setting change takes effect. Let's repeat that to make sure we understand:

/You must log out and back in before this setting change takes effect./

From what I can tell, this goes for any change to a user's groups, but it isn't particularly well documented anywhere. (Psst: I am not the only one to have run into this.)

But wait; there's more! This is only vaguely referenced in the manual ("/using the docker command line tool/"), but to have access to the Docker CLI, you're going to have to install Docker (pkgs.docker) for your user, either by putting it in configuration.nix's systemPackages or by using a solution such as this.

And that's it. If all you wanted to do was set up docker to run with the default configuration, you're done now. Congrats! Have a donut. You've earned it.

The space race

Ah, yes, disk space ... We've got Docker in place now, and, assuming the group change setting has taken effect, we can start playing with it. That's what I did. For a day or so. As per the usual NixOS song and dance, I wanted to change some configuration settings, so I tried to rebuild my system and got this fateful message:

No space left on device

Now, this isn't anything new. I've realized since setting up the OS, that I should have probably allocated more space for the root partition (someone once told me that NixOS "trades disk space for sanity"). "Oh, well," I thought. "Guess I have to delete some old generations again." So I ran the garbage collector. This usually frees up about 7--10GB of space, but now it was hardly removing two! I tried all the tricks that I knew of, but nothing seemed to make a difference. And then the thought struck me: "Docker is installed as a system service. That means it probably stores images system-wide too!". And indeed, after looking through the 'docks' (har har), I found that the default place Docker stores data is in /var/lib/docker.

So I killed all of my containers, deleted all of my images, and lo and behold: My root partition had suddenly lost nearly 10GB! Superb!

The next step, then, would be to figure out how to store the data somewhere else. Luckily, the docs (NixOS and Docker) are quite clear on this point: For your Docker configuration, you can specify an option, --data-root, and have the data stored there instead. In general, I prefer not to mess around with where things are stored too much, but in some cases it makes life easier (until I get around to repartitioning my drive, anyway), so I decided I'd put it under /home/docker for now. This is easily done like this:

    virtualisation.docker.extraOptions = "--data-root /home/docker";

This setting means that my /home partition carries some extra data, but it's got more than enough space to deal with it.

Putting it into practice

Now, having experienced first-hand how space-hungry Docker can be, and having read through the documentation, I found that there are other options that might come in handy. For now, I decided to have the system aggressively auto-prune on a weekly basis. This should keep me from running into space issues any time soon, and if it gets annoying I can always change the settings.

At the end of this little adventure, the resulting configuration.nix should look something like this:

  # Docker CLI (either put this here or in your user config)
  environment.systemPackages = with pkgs; [ docker ];

  # Put your user in the correct group
  users.extraUsers.thomas.extraGroups = [ "docker" ];

  # Set up the Docker daemon
  virtualisation.docker = {
    enable = true;
    autoPrune = {
      enable = true;
      flags = ["--all"];
    };
    extraOptions = "--data-root /home/docker";
  };

In summary, these are the steps needed:

  1. Enable virtualisation.docker
  2. Make sure your user is in the ~"docker"~ group. (Log out and back in!)
  3. Install Docker for your user
  4. (Optional) If, like me, you have issues with space, change the data-root to somewhere else, such as a different partition or an external drive.

So there you have it, folks! It really is quite simple ... once you figure out all the tricky parts.