Nix: override packages with overlays

A brief foray into nixpkgs customization
Text saying 'Nix: overlays; a practical introduction'. The Nix snowflake is visible in the lower right corner.

In general, I am exceedingly happy with the Nix package manager, but one issue that I occasionally run into is that some packages aren't quite up to date. Usually, this isn't a big deal, and I've just had to learn that I can't always live on the bleeding edge. However, when you require functionality that was introduced after the Nix package was last updated, you're suddenly out of luck.

Luckily, though, Nix provides something called overlays, which you can use "to extend and change nixpkgs" according to the NixOS documentation.

A brief summary of the problem

To make this easier to follow, I'll make the problem more concrete. One of my goals for myself when working with Kubernetes, is to have absolutely all configuration stored as code. Regardless of how you feel about YAML, there's no denying that it can easily get tedious and repetitive with no templating or programming language features baked in.

For a while now, I've been wanting to look into the Dhall language for configuration, and this seemed a perfect opportunity, considering it also has Kubernetes bindings. But here's the problem: the Dhall version in nixpkgs is 1.24.0, but the Kubernetes bindings require at least 1.27.0 to work.

What are overlays?

For a more thorough explanation, consult the NixOS wiki or watch /Nixpkgs Overlays --- A place for all excluded packages/ by Nicolas Pierron. In short, overlays are functions that transform package sets by adding or overriding keys. Notably, Mozilla has their own set of Nix packages, Mozilla nixpkgs, which contains a Rust overlay, allowing Nix users to stay up to date with the quick cadence of Rust releases. In our case, it allows us to replace the packages that are out of date with newer updated versions.

There's quite a bit of content out there on how to use overlays: the aforementioned resources are a great place to start, and blog posts like the DOs and DON'Ts of overlays by Flying Circus offer further insights. That said, I couldn't find anything that really fit my use case. Thankfully, the helpful people over at the NixOS Discourse forums pointed me in the right direction.

The solution

Now that we know what they are and how they work, we can take a stab at creating our own overlays.

Let's assume our folder structure looks like the following:

    ├── shell.nix
    └── nix-files
        └── overlay.nix

We can then write a very simple shell.nix file that just sets us up with the dhall package.

    { pkgs ? import <nixpkgs> { overlays = [ (import ./nix-files/overlay.nix) ]; }
    }:
    stdenv.mkDerivation {
      name = "dhall";
      buildInputs = [ pkgs.dhall ];
    }

There's nothing terribly exciting going on here, but notice that when importing nixpkgs, we specify a list of overlays. In our case, it's only one, but we can use as many as we want.

With this all set up, let's have a look at the overlay:

    self: super: {
      dhall = builtins.fetchTarball
        "https://github.com/dhall-lang/dhall-haskell/releases/download/1.30.0/dhall-1.30.0-x86_64-linux.tar.bz2";
    }

The first line lists the expression's parameters. By convention, they're called self and super. In our case, we needn't concern ourselves with them, because we're replacing one of the keys entirely. The actual value of the expression is just a set with a single key, dhall, mapped to the result of fetching a tarball. This replaces that value in the original package set.

Some additional notes on the overlay: We're using the plain version of the builtins.fetchTarball function in this example. There's also a version that takes an attribute set with a URL and a hash, which will make sure that the you get the same version every time. Furthermore, we're specifying which exact version we want (1.30.0). That's perfectly fine for a little example like this, and will probably be good enough for a little dev environment. It would be ideal, though, if we could make sure we're always up to date with the latest release. The aforementioned forum help thread has a suggestion for how to do this, but it seemed to have some unexpected issues, so I’ll leave that as an exercise for the reader.

Now, when loading the shell.nix file, we should see the following:

    $ dhall version
    1.30.0

Boom.


Kubernetes first steps

Analysis paralysis
The kubernetes logo, next to which is some text saying "kubernetes: analysis paralysis". In the lower right corner is this website's logo.

You know, that's a pretty nice blue.

One of my stated goals for this year, was to have a Kubernetes cluster running somewhere. As stated in the goals post, the deadline for this was April 1^{st}. As of Saturday night, I've got one.

I ended up using Digital Ocean for this, taking advantage of the Changelog podcast's sponsored signup offer, giving me two months to spend a $100 credit. While I've heard a lot of good things about Digital Ocean being very affordable, their managed Kubernetes offering can easily rack up some costs if you don't take care.

But now that I've got the cluster up and running, I'm at a bit of a loss. Just where do I go from here? What's the next thing to do? What should I prioritize? And why didn't I just start off with Minikube?

Why didn't I just use Minikube for now?

Actually, let's start with the last question: Why didn't I just use Minikube? Honestly, it was mostly a moment of weakness. I was actually busy procrastinating when I thought, 'hmm, I wonder how long it'd take to get set up with Digital Ocean'.

In hindsight: sure, Minikube would have been cheaper (at least once my credit is up), but having the actual cluster feels more 'real'. Like it's something I need to take care of. It's also available from anywhere and doesn't use my machine's processing power.

What do I want to get out of this?

So I got the cluster. Now what? Well, why did I want to do this?

While the option to just spawn little demo applications and have them available from wherever is great, I'm more interested in looking at Kubernetes for the overarching architecture and operational side of things. Specifically, I'd like to use this as a way to familiarize myself with:

  • service meshes and API gateways
  • monitoring and telemetry
  • security best practices

Each of these topics require a great deal of time and practice to master, but I'm not expecting to be a wizened monk any time soon. These are areas that I find fascinating and that I want to explore, but I cannot yet lay out any specific goals, as I need more time to research them.

Wait, didn't I have another goal for Kubernetes?

Sure did! In addition to just getting a cluster ready, my second Kubernetes goal was to expose a Haskell app with an API. This is still on the cards, but isn't due any time soon. This leans more towards the Nix and Haskell side of things, and less directly towards Kubernetes, so I can take some time to figure out how I want to do it.

Now what?

So where does this leave me? I think the biggest issue I have identified is my lack of knowledge. I've been working with RedHat's OpenShift for the past year, and feel like I have a pretty good grasp of how that works, at least from the dev side. But when faced with that clean cluster, I froze. I didn't know where to go, didn't know what commands to run, or where to turn to for advice. So I need to formulate a plan.

I think I would like to familiarize myself with Kubernetes more or less from the ground up. That means reading documentation, doing tutorials, and making sure my understanding is correct. Second: I would like to have a look at basic security measures and best practices. At the very least, I want to enable and configure RBAC. When I get this far, I think it would be an appropriate time to take a step back and reevaluate where I'm headed and recalculate.


Corecursion and anamorphisms

Unfolding what lies beneath
Some blurred out Haskell code, partially covered by the text 'corecursion', in the style of the 'Corecursive' podcast logo.

You know that feeling where you hear about something and you immediately need to look into it? I had that while listening to the most recent episode of Adam Gordon Bell's /Corecursive/ podcast today, where they were talking about where the name of the podcast came from. Up until then I had just assumed that corecursive meant mutually recursive1, but boy, was I wrong!

The C-word

According to Wikipedia, 'corecursion is type of operation that is a dual to recursion'. This quickly gets very theoretical, but the long and short of it is that corecursion can be seen as a kind of opposite of recursion: Where recursion allows you to operate on arbitrarily complex data as long as you can reduce it down to a set of base cases, corecursion allows you to generate arbitrarily complex data when given a base case.

Err ... what?

Think of it like this: Given a list of numbers (arbitrarily complex data), you can define a recursive function to sum all the numbers using simple base cases: is the list empty or are there more elements to add? Your language of choice may well have a sum function that does just this. If not, you can implement it with a fold or reduce.

However, given a number, can you create a list that when summed up would equal this number? This would be a form of corecursion, where we take simple data (a number), and generate arbitrarily complex data based on the input (the resulting list of numbers).

Let's talk about fold functions specifically. As we talked about in a previous post on folding, fold functions are catamorphisms. They take a data structure and reduce it to a 'lower' form. The opposite of a catamorphism is an anamorphism2, and the opposite of a fold is an unfold (at least in Haskell). An anamorphism generates a sequence by repeatedly applying a function onto its previous result.

According to Wikipedia, 'the anamorphism of a coinductive type denotes the assignment of a coalgebra to its unique morphism to the final coalgebra of an endofunctor.' Don't worry: you needn't understand that to understand unfolding and corecursion (I sure don't). Instead, let's try and get a feel of what we might use corecursion for.

Returning to our previous example of destructuring a number into a list of terms, let's look at a couple of ways to do it using unfold.

First, let's look at the unfoldr function itself. It is defined in the Data.List module, and its type signature is:

    unfoldr :: (b -> Maybe (a, b)) -> b -> [a]

Given a function from b to Maybe (a, b) and a b, it will produce a list of a. If the function (let's call it f) returns Just (x, y), x will be added to the result, and f will be called again with y. This continues until f returns Nothing, at which point unfoldr terminates, returning the list it has created.

With this, we can create a function that takes an integral value and returns a list that when summed, is equal to the value we passed in. We'll start with an easy variant that just destructures the input into ones. For simplicity's sake, we're ignoring non-negative numbers.

  module Unfold where

  import Data.List (unfoldr)

  unroll :: Integral a => a -> [a]
  unroll = unfoldr f
    where
      f n
        | n <= 0 = Nothing
        | otherwise = Just (1, n - 1)

Easy enough. The function passed to unfoldr returns nothing if there is no more to sum. Otherwise, add $1$ to the list, and call again with n-1.

    unroll 5 -- [1,1,1,1,1]
    unroll 0 -- []

But we can have some more fun with this. How about we try and destructure a number into a list of the pieces we'd need to create a binary representation of it?

    binary :: Integral a => a -> [a]
    binary = unfoldr f
      where
        f n
          | n <= 0 = Nothing
          | otherwise =
            let powerOfTwo = 2 ^ floor (logBase 2 $ fromIntegral n)
             in Just (powerOfTwo, n - powerOfTwo)

This is a bit more complicated, but only because we need to map the input value to a power of two. Luckily, we can use logBase 2 to get the exponent you'd need to get n, and then floor it to get the greatest integral exponent. This becomes the next entry to the list. What's left gets passed in to the next application of the function.

    binary 0 -- []
    binary 5 -- [4,1]
    binary 255 -- [128,64,32,16,8,4,2,1]
    binary 256 -- [256]

Pretty neat, huh? What if we take it a step further and convert the number to its binary representation instead, as if it was base 2?

    base2 :: Integral a => a -> a
    base2 = sum . unfoldr f
      where
        f n
          | n <= 0 = Nothing
          | otherwise =
            let exponent = floor (logBase 2 $ fromIntegral n)
             in Just (10 ^ exponent, n - 2 ^ exponent)

As you'd expect:

    base2 0 -- 0
    base2 5 -- 101
    base2 10 -- 1010

Not too shabby at all.


Alright, I think we have had enough fun with corecursion for now. It's been a very unexpected, but very insightful little journey, and I thank you for taking it with me. Until next time!

Footnotes

I'm probably not the only one to do this. Wikipedia has a note under disambiguation on its article on corecursion that says 'not to be confused with mutual recursion'

Similar to how your body can be in either anabolic or catabolic states, for all you fitness people out there.