This image really grows on you, you know?

Oh, hey, it's you again! Always a pleasure to have you along for the ride. This time, let's have a deeper look at one of the basic underpinnings of the Haskell language: types. Being an implementation of a typed lambda calculus, types are a big part of what makes Haskell Haskell, and a good understanding of them is essential to master the language.

The chapter opens with a couple of pages asking what types are good for and describing how it can help us enforce invariants, avoid (certain types of) bugs, enable compiler optimizations, and reason about our systems.

For this summary, we'll explore how to constrain argument types to typeclasses, have another dive into currying and partial application, and have another look at polymorphism.

Constraining your types

Let's start off by looking at how we can constrain a function's domain by using typeclasses and how that impacts the function.

By default, when you create a function and you don't annotate it, the compiler will give it as wide a definition as it can, using the least specific types, constraining it to a typeclass if necessary, but leaving it completely open if possible. Of course, if one or more of the parameters can only be of a concrete type, it will infer that too.

This generalization is why we can define a function for numbers and then use that function for any type that has an instance of Num.

To add a typeclass constraint to a type signature, specify the typeclasses in before the arguments and separate them with a fat arrow (=>). If you have multiple constraints, put them in parentheses and separate them with commas. The below signatures illustrate this:

  Num a => a -> a -> a

  (Num a, Num b) => a -> b -> b

  (Ord a, Num a) => a -> a -> Ordering

The type constraints only show up at the type level and are never visible at the term level.

As we'll see when we talk about polymorphism a bit later, constraining our types like this enables us to create more powerful and specialized functions at the cost of decreasing general applicability.

Currying

We've talked about currying before, but this time it's given an entire subchapter! If you can't quite remember what it is, let me give you a quick summary:

Because any function in a lambda calculus can only be applied to one argument, the way to construct functions that can be applied to multiple arguments is to nest multiple lambdas. Haskell abstracts this away for us and makes it seems like a function can be applied to an arbitrary number of arguments. The only place where this actually shines through is in function signatures, where the function operator ((->)) separates the parameters, hinting that it's actually just a bunch of nested lambdas

This is handy because it allows partial application, allowing us to pass around functions that haven't been fully applied and add the remaining arguments when it suits us.


Haskell is curried by default, but it's possible to uncurry functions. What does that mean? It means that if you have a function that you need to apply to two arguments, you can turn it into a function that only accepts a single argument: a tuple containing the two expected arguments. The opposite is also true: if you have a function that expects a tuple, you can curry it to turn it into a function expecting two separate arguments.

The type and possible implementation of uncurry is as follows:

    uncurry :: (a -> b -> c) -> (a, b) -> c
    uncurry f (x, y) = f x y

We can play around with it a bit:

    uncurriedPlus :: Num a => (a, a) -> a
    uncurriedPlus (x, y) = x + y

    -- uncurriedPlus could also be implemented using the uncurry function:
    uncurriedPlus' :: Num a => (a, a) -> a
    uncurriedPlus' = uncurry (+)

    -- and in the same way, we could redefine a plus function using curry:
    plus :: Num a => a -> a -> a
    plus = curry uncurriedPlus

Can you see how you can implement fst and const in terms of each other composed with either curry or uncurry? Look up the types in the REPL and play around with it a bit if not.

In short:

  • Curried functions: many nested functions, each taking one argument each
  • Uncurried functions: one function, many arguments (grouped in a tuple)

Sectioning makes a little return in this chapter too. Quick reminder: sectioning is partial applications of infix operators or functions, where we can leave out the term on either side and Haskell will cleverly know that we are missing the value to fill it up: (2^) 3 would give $8$, for instance, while (^2) 8 would give 64. Sectioning works for any binary, infix function.

Polymorphism

Polymorphism was also mentioned last time, but this time we dive a bit deeper into what it is. Our definitions of parametric and constrained polymorphism return, and we're introduced to something called ad-hoc polymorphism, so let's recap and explore a bit.

Parametric polymorphism is the broadest form of polymorphism, and is when the type variables are fully polymorphic, completely free of any and all constraints. This means that their actual concrete implementation could by anything and the function would still do what it should. The function will work on any type. An example of a parametrically polymorphic function is id:

    id :: a -> a

Can you tell what it does? When a function is this polymorphic, there is only one thing it can do, and that is return whatever it receives as an argument. Hmm? Useless, you say? Well, it has its uses, but this is not the time or place to explore them.

Constrained polymorphism is when there are certain constraints placed upon the types by specifying that they must belong to a typeclass. This makes the function able to use functions defined for a given typeclass, increasing the number of things the function can do by simultaneously decreasing the number of values it can accept. Any numeric function is a good example of this:

    (+) :: (Num a) => a -> a -> a

Constraining your types will give you more power within the function, but a smaller domain of input values. A good rule of thumb would be to always try and make your functions as general and unconstrained as possible to give the caller maximum freedom, but not to be afraid of putting constraints on the inputs when it is required by the task at hand.


Sometimes you'll come across something known as a polymorphic constant. In fact, we've seen a lot of these already: any time you only specify a literal numeric value without specifying the type, it will be polymorphic until it's given a more specific type such as by function application.


Haskell's type inference is built on extended version of the Damas-Hindley-Milner type system. It will infer the most generally applicable (polymorphic) type that is still correct for your code. But while declaring types might not be necessary, it's still considered good practice and is recommended.

Definitions

Ad-hoc polymorphism
Polymorphism that applies one or more typeclass constraints to a polymorphic type variable.
Higher-order functions
these are mentioned in the chapter, but not expanded upon. A higher-order function is a function that either accepts as argument or returns a function when evaluated. map is probably the poster child for having a function from a to b as its first parameter.
Parametricity
Parametricity states that the behavior of a function will be uniform across all concrete applications of the function.
Polymorphism
In Haskell: type variables that may refer to more than one concrete type. Usually manifested as ad-hoc or parametric polymorphism.
Type inference
The ability of some programming languages to infer types from terms (your code) without explicit type annotations.
Type variable
A way to refer to an unspecified type or set of types in type signatures.
Typeclass
A collection of functions that must be implemented for any type wishing to create an instance of said typeclass. Allows for ad-hoc polymorphism and abstractions.

Next time

Next up is the chapter on typeclasses, where we'll finally get an explanation for how all these numeric types work together and look at some basic, easily derivable typeclasses. Stay tuned!


Getting sick of this cover image yet?

Welcome back to yet another foray into the wonderful world of Haskell, where this time, we're turning our gaze towards basic data types. The book presents this quote by Robin Milner at the start of the chapter:

There are many ways of trying to understand programs. People often rely too much on one way, which is called “debugging” and consists of running a partly-understood program to see if it does what you expected. Another way, which ML advocates, is to install some means of understanding in the very programs themselves.

Note that in this context, ML is not machine learning, but the language known as ML. If you don't know who Robin Milner was (I didn't), Wikipedia has this to say about him:

He developed LCF, one of the first tools for automated theorem proving. The language he developed for LCF, ML, was the first language with polymorphic type inference and type-safe exception handling. In a very different area, Milner also developed a theoretical framework for analyzing concurrent systems, the calculus of communicating systems (CCS), and its successor, the pi-calculus.

So I'd say he's reasonably influential.

But that's enough of a history lesson for now. Let's move on to some more theory, shall we?


Let's start right at the beginning: what are types? In short, types are sets of values, where a set is a collection with no duplicate entries. Every expression---when evaluated---yields a value, and that value has a type. Each type is defined by the members (the values) that inhabit (are part of) the type. Boolean values, for instance, have two inhabitants: True and False. Integers have an infinite number of inhabitants, every integral value from negative infinity to positive infinity.

Types help us group multiple values that have something in common. It could be a specific domain or model or something more abstract. Whatever it is, they let us talk about a set of values under a shared name.

Data declarations

Data Types in Haskell are defined by data declarations. Data declarations consist of a type constructor with belonging data constructors.

A type constructor is the name of the type, such as Bool or Int, and is used in type signatures, or the type level of your code.

Data constructors are the values that inhabit the type they are defined in, such as the aforementioned True and False for Bool. These are the values that appear in the logic of your code, at the so-called term level.

Let's look at the data declaration of the Bool data type to make it clearer:

  data Bool = False | True
  --   [1]    [2]  [3] [4]
  1. The name of the type, aka the type constructor. This is what you see in function signatures.
  2. Data constructor for the value False.
  3. A pipe operator, indicating that this is a sum type (or discriminated union). This tells us that a Bool value is either False or True, but never both.
  4. Data constructor for the value True.

This is a very simple data declaration. Others may take arguments or use product types instead of sum types, but what they all have in common is the data keyword followed by the name of the type. Most are followed by an equals operator and a set of data constructors.

Numeric types

We've worked a bit with numeric types previously, but never really looked too deeply at them. They work quite differently in Haskell than what they do in a number of other languages, so let's see what we can find.

The book lists two categories of numbers, which each have multiple types:

Integral numbers
Int
Fixed-precision integer. Has a minimum and a maximum value.
Integer
Integer that supports arbitrarily large and small numbers.
Fractional numbers
Float
Single precision floating point numbers. Like in most languages, limited in precision.
Double
Double-precision floating point number. Twice as many bits as a Float. Still limited in precision.
Rational :: Fractional number that represents a ratio of two integers. A value such as ~x / y
Rational~ will be a value consisting of two Integer values and represents the ratio of $x$ to $y$. Arbitrarily precise.
Scientific
Space efficient and almost arbitrarily precise. Represented using scientific notation, storing the coefficient as an Integer and the exponent as an Int. Because Int is bounded, there is technically a limit to the size of the number, but hitting it is very unlikely. More efficient, but less precise than Rational.

All these data types have instances of a typeclass called Num (more about typeclasses in the coming chapters), which lets us use any which one of them for most functions that only require standard operations.

Choosing an integral data type

As noted above, Integer can represent arbitrarily large integers, while Int (and the related Int8, Int16, and so forth) can only express as many as their predefined size allows them, falling back to overflow wrapping if they exceed their limits.

Because of this, the book advises us to always choose Integer over Int unless we really understand the limitations and the additional performance has an impact.

Fractional numbers

Of the common fractional types mentioned above, three of them come bundled with GHC, while the fourth, Scientific, comes from an external library.

As was the case with the integral numbers, we are advised to stick to the most precise or efficient type we can use. This would usually be Rational or Scientific, but for certain scenarios, such as graphics programming, you might want to use Float.

Certain mathematical operations require the inputs to be fractional rather than just numeric; division being a great example. The type of the division operator is ~() :: Fractional a => a -> a -> a~. The ~Fractional a =>~ part tells us that this is a /typeclass constraint, and that the type a must have an instance of this typeclass.

The Num typeclass is a supertype of the Fractional typeclass, meaning that to create an instance of Fractional, the type must also define an instance of Num. In other words: all Fractional instances are also instances of Num, while the opposite is not true.

Comparison

Like most languages, Haskell defines a number of operators to perform comparisons. Most of these are the same as you see in most languages, but the inequality operator may be a bit different. Anyway, for your viewing pleasure, here they are:

~(<)~
Less than
~(>)~
Greater than
~(<=)~
Less than or equal to
~(>=)~
Greater than or equal to
~(==)~
Equal to
~(/=)~
Not equal to

The equality operators take two values that have instances of the Eq typeclass, and the comparison operators take instances of the Ord typeclass, which is a subclass of Eq. As such, an Ord instance can be used with any of the above operators, while an Eq instance can only be used with the equality operators.

if ... then ... else ...

Haskell operates with ~if~ expressions, rather than ~if~ statements. What this means is that the expression itself evaluates to a value; so you can assign a variable to an if expression, for example.

The syntax is:

  if CONDITION
  then EXPR_A
  else EXPR_B

An assignment might look like this:

  -- the value of a will evaluate to 2
  let a = if True then 2 else 0

This is similar to how the ternary operator (?:) works in most C-like languages, and much in the same way, you must have an else clause. This is because it is an expression, and an expression must evaluate to a value.

Tuples

The Tuple type is a way to pack multiple values together and pass them around. Tuple syntax is the same at both type and term levels and is easily distinguished from other types by its distinct look: (x, y), a set of parentheses surrounding two or more values or types separated by commas.

At the type level, the tuple would contain types---e.g. (Int, Bool)---, while at the term level, it would contain expressions: (1, True), (x, y), etc.

A tuple has a fixed number of constituents (parts, members). A tuple with two element is called a pair or a tuple ((a, b)), while a three-element tuple is known as a three-tuple or a triple ((a, b, c)). The number of constituents is known as the tuple's arity.

The data declaration for a pair can be found by looking up the (,) operator in the GHCi:

  data (,) a b = (,) a b

This is quite different from what we saw earlier with the Bool declaration: It takes two parameters, represented by type variables a and b, and it's a so-called product type, rather than a sum type, meaning that the number of possible combinations is the product of the number of possible values for each of the types it's applied to.

The two-element tuple comes with some default convenience functions in Haskell, fst and snd, which gets the first or second element of the tuple, respectively. Their type signatures are:

  fst :: (a, b) -> a
  snd :: (a, b) -> b

This way to describe the types of the tuples also extends to when we use them in the function implementation, where we can destructure it and name and use the elements directly. The two above functions can be implemented like this:

  fst :: (a, b) -> a
  fst (a, b) = a

  snd :: (a, b) -> b
  snd (a, b) = b

This destructuring and matching on the shape of the data is known as pattern matching, and is something we'll see a lot more of further down the road.

Lists

We had a little run-in with lists last time, and they're only barely mentioned in this chapter. The reason is that there's a full chapter devoted to lists coming up, so we'll save the deep-dive for then.

The one thing they do mention this time, though, is that, much like tuples, lists have special syntax too, where the type or the values are enclosed in square brackets, e.g. [Integer] for describing the type of a list of integral values, and [1, 2, 3, 4] to construct one of them.

As opposed to tuples, they can only take values of a single type, and there is no limit to how many or how few elements a list can have.

Names and variables

To close off the chapter, there is a little section on naming conventions in Haskell.

Functions, when used as parameters, are typically labeled with variables starting with f, continuing alphabetically (g, h, ....). May also have numbers (f1, f2) or an ending apostrophe (~f'~)---pronounced f-prime---if they're closely related to a function labeled f.

They may also be given variable names that describe what they do, such as a function fetching text from some source being called txt.

Type variables, the ones in type signatures, are commonly given names starting from a and going along alphabetically.

Arguments to functions usually go from x, though they might also have some other letter assigned to serve a mnemonic function (such as n for a number or r for the radius of a circle).

Of course, variable names don't have to be single letters, but in smaller programs they usually are. In more domain-specific code, it might make more sense to use longer names.

If you have a list of things, it's common to use a name such as xs, especially if one element from the list might be called x (as if xs was the plural form of x).

This is demonstrated by the commonly seen construct (x:xs) which is a way to pattern match on a list using the cons operator we saw last time. This can be used to destructure the list if it has at least one element, assigning the variable x to the first element, and xs to the rest of the list:

      sum :: Num a => [a] -> a
      sum [] = 0
      sum (x:xs) = x + sum xs

The above function will recursively sum a list by first checking whether it's empty. If it is, it returns $0$. Otherwise, it returns the sum of the current head and the value of the sum of the rest of the list.

Definitions

To tide us over until next time, this chapter provides a bunch of juicy definitions!

Tuple
An ordered grouping of values. You cannot have a tuple with only one element, but the type unit or () can be thought of as a zero-element tuple.
Typeclass
A set of operations defined for a polymorphic type. If a type has an instance of a typeclass, it can be used in any function requiring a value of that typeclass. In Haskell, you can only define one instance of any one typeclass for a given data type. In other words, type a can have at most one instance of Eq.
Data constructors
How we create values that inhabit a type in Haskell. Can be constant values (True), or take one or more arguments ((,) a b)
Type constructors
The name of a type. Can only be used in type signatures
Data declarations
Definition of a data type in Haskell. Consists of a type constructor and zero or more data constructors.
Type alias
A way to refer to a type constructor or type constant by a different name, usually to communicate something more specific. type Name = String would let you use Name in place of String in your code, allowing you to say something more about what kind of string you want on the type level.
Arity
The number of arguments a function accepts. In Haskell (lambda calculus), all functions are actually 1-arity (unary) due to currying, but we tend to let Haskell abstract that away and think of the total number of arguments needed to fully evaluate the function.
Polymorphism
The ability to write code using values that may be one of several, or any, type. There are two types of polymorphism: parametric and constrained.

Parametric polymorphism is when a function can take any type of value because it doesn't use any information about it. Parametric polymorphism is easily recognised by there being no constraints placed on the type variable.

Constrained (or bounded) polymorphism is when the valid set of types is constrained by a typeclass, such as for the equality operators that require their arguments to have an instance of Eq.

Summary

Whew; well done for making it to the end! There was a lot to get through this time around, but it's been very insightful. Next time we'll be looking more at types in Haskell, exploring constraints and polymorphism.


Let's read: Haskell Programming from First Principles, pt III

What if a string was just a list of characters?

You still following?

This will be a rather short and focused post. Chapter 3 of the book doesn't spring any exciting new concepts on us or cover large topics such as general syntax. Instead it chooses to zoom in on the String datatype. We look at how they are represented in Haskell and how we can treat them like lists because, spoiler alert: that's what they are.

Like in all compiled languages that I've used, there's a difference between the data types Char and String and how you represent the literals. As is the convention, a Char is represented using single quotes (~'a'~), while a string uses double quotes: ~"hey, tiger"~.

However, unlike in a most languages I'm aware of, a String is actually just syntactic sugar for a linked list of characters ([Char]). So ['h', 'e', 'l', 'l', 'o'] and ~"hello"~ are just two ways of writing the exact same thing. This has some interesting effects on how we handle strings, in that it means anything we can do with a list, we can do with a string, but it also means that certain operations will be very costly. There are alternatives for when you need performance, but let's not concern ourselves with that for now.

String/list operations

In the introduction to handling strings (secretly lists), we are introduced to the functions head, tail, take, drop, and the operator !!, so let's have a quick look at what they do.

The head and tail functions return the first element of the list and the list without the first element respectively; so that if you take the head of a list, the tail of a list, and then put them back together, you'd have the original list back:

splitAndPutBackTogether list =
  let h = head list
      t = tail list
   in h : t

Side note: this is the first time we're seeing that ~:~ operator. It's known as ~cons~ and it adds a value to the front of a list.

Be aware that these functions---~head~ and tail---are not total, and will throw exceptions at you if you give them an empty list.

A *total function* is one that is well defined and gives you a result for every input value---i.e. maps every value from the /domain/ into a value in the /image/ (the subset of the /codomain/ containing possible output values), to use the terminology from chapter one.

Functions that are defined only for a subset of their valid arguments are known as *partial functions*.

More on this later on in the book.

The take and drop functions, though, are total, and will not throw anything in your face. They operate on lists, take an Int as the first argument and then split the list after as many elements as specified by the Int, giving you either the first portion or the second. They're pretty much like head and tail, except you can choose how many elements go in each pile, and you'll always get a list back. If there aren't enough elements in the list to reach the specified number, take will give you back the whole list, and drop will give you an empty list.

!! is the index operator and returns the element at the specified index in the list. The equivalent of [n] in a lot of other languages. Zero-indexed and unsafe: ~"hey" !! 2~ would return ~'y'~, while ~"hey" !! 5~ would crash.

Definitions

There are only a few interesting definitions in this chapter:

String
a sequence of characters. In Haskell, the String type is just an alias for a linked list of characters, [Char].
Type (or datatype)
"A classification of values or data", such as Int, Bool, String, etc. Notably does not have any behavior associated with it.
Concatenation
Joining sequences of values together. Often used with lists, it's the act of putting one before the other and creating a new list. The operator (++) takes care of it in Haskell.

That's it for this chapter. Next time, we'll be looking at basic datatypes. This is when things start to get interesting, so strap yourself in and get ready.