Maybe from first principles

home/blog/monads

Maybe from first principles

2024.07.31

Monads have become something often discussed by functional programmers and lauded as elegant ways to model errors and side effects when such things are formally nonexistent in a language. This is true, but the answer to the question of what a monad actually is remains famously opaque to newcomers. One can usually end up with a pretty good intuitive feeling for what a monad is just by using any if their various implementations, but it’s significantly more difficult to understand it from the true mathematical definition.

Let’s try it anyway. Here I’ll first provide the necessary background to understand monads in the mathematical sense, then map those concept to Haskell with a few illustrative examples. Finally, I’ll attempt to give a more intuitive explanation of monads in terms of what the can actually do for us.

What follows assumes a passing familiarity with the basic elements of Haskell and functional programming patterns (i.e. ML-style syntax, first-class function, currying, and recursion). Familiarity with mathematical phrasing and logical quantifiers (e.g. “for all “) is also assumed.

Some category theory

Despite its famous opaqueness, let’s start off with the mathematical definition which will, if nothing else, give us some milestones to aim for in concept space.

A monad is a monoid in the category of endofunctors of a fixed category.

The aforementioned opaqueness is apparent. But right off the bat, we have out milestone questions:

What’s a category?
What’s an endofunctor?
What’s a category of endofunctors?
What’s a monoid?

In answering these, we’ll undoubtedly accrue additional sub-questions, but these are useful high-level guides to what must follow.

Functors

So then what are endofunctors and categories of endofunctors? As you can probably guess from the section header, this really translates to a question of what functors are, since–from the name–endofunctors are actually a special case of functors.

So what are functors? Simply put, a functor is a map from one category to another; it maps objects to objects and morphisms to morphisms, with the constraint that it preserves identity morphisms and composition. More precisely: For all objects and morphisms , the functor must satisfy

Functors are conceptually very important in the study of categories. To give just a single example, we can look at what happens when we interpret a group as a category. First, we’ll quickly define a group:

A group is a set and a binary operation on the elements of , subject to the following constraints for all :

is closed under ,

is associative,

has an identity element under , i.e. there exists some such that, for any ,

Every element in has an inverse under , i.e. for every , there exists some such that

So right away, we can make some comparisons between categories and groups. The first three constraints look an awful lot like those in the definition of a category. The elements of a group under are like the morphisms of the Hom-set of a category under . We require that both operations

combine two elements of their domain to return another element of that domain (closure),
product the same output no matter the order of application (associativity), and
have certain elements on which their action is trivial (identities).

On the other hand, groups only have one kind of thing (the group elements), while categories have two (objects and morphisms), and groups have an additional constraint that each element be invertible. The similarity is there, though, and I’ll just cut to the chase. Any group can be interpreted as a category with a single object (call it ), where each morphism is an element of the group and composition of morphisms is the group operation.

An important thing to note is that every morphism in this one-object category is an isomorphism, meaning that it’s invertible, thanks to the fourth-constraint we plaed on groups above. If we relaxed taht constraint and allowed non-invertible morphisms/group elements, then we’d have a monoid (referring in a category-theoretic sense to the single object and its morphisms, and not necessarily the whole category), which is another important concept on the way to monads.

And here with both groups and monoids, we have the concept of a (group/monoid) homomorphism, which is a “structure-preserving” map between groups. Specifically, a group homomorphism maps the elements of one group to another group in a way that preserves products, identities, and inverses. That is, all the key properties of a group:

For all , .
For identities and , .
For all , .

(And a monoid homomorphism does the same except for the last condition.)

Notice how, like when we interpreted a group as a category, the first two conditions are essentially those of functors, while the last is specific to groups (i.e. coming from the fact that group elements are required to be invertible). This means we can think of groups as special categories with some internal structured modeled by their morphisms, and group homomorphisms as special functors that preserve that structure while transforming between groups. But notice also that his is essentially how we think of morphisms between objects. This leads us to another special category called , the category of groups and group homomorphisms. Here, is interesting because it’s also a category that can be thought of as a kind of “category of categories”–each group/object is itself also a category, and each morphism can be thought of as a functor between the object categories.

This is a general pattern that can be repeated for other kinds of categories, too. Instead of only thinking about group-like categories, we could also imagine other categories modeled after different mathematical objects like sets or topological spaces. In this case, functors would be very powerful indeed since they would describe transformations between different areas of math–intuitively, functors can be seen as a formalization of the analogy.

Natural transformations

Consider that, in the process of constructing a functor, one must effectively define a mapping on morphisms, which are themselves usually some kind of mapping. And, given a suitable set of categories and a suitable set of functors between them, one can then construct a category of categories⁵. A reasonable question to then ask is, “what happens when we try to construct a mapping between functors?”

This kind of mapping is called a natural transformation. Specifically, natural transformations are maps from functors to other functors with the same input and output categories that preserve the structure of those functors (which themselves preserve the structure of the underlying categories). More precisely, any natural transformation between functors and must assign to every object a morphism and . The condition that a natural transformation preserves structure translates to the condition that, given a morphism in , applying to and then is the same as applying on and then .

This can be drawn as a “commutative diagram,” where any two paths formed by composition of morphisms with the same starting and ending objects are equivalent.

If, for every object , the morphism is an isomorphism–that is, if it is invertible–then is called a “natural isomorphism.”

Just as with categories and functors, it is also sometimes possible to construct categories of functors and natural transformations, where we then think of functors as objects, and their natural transformations as morphisms. The last piece to the monad puzzle is the concept of an endofunctor, which is simply a functor that points from a given category back to itself.

Monads

So finally we come to monads. Now that we know what categories, (endo)functors, and monoids are, it’s time to put it all together. Here’s the definition again:

A monad is a monoid in the category of endofunctors of a fixed category.

We’ll start with this fixed category, calling it , which is left entirely unconstrained. Here is treated as a kind of backdrop, and we work in the category of its endofunctor, which we’ll call . In , the objects are endofunctors on , morphisms are natural transformations between them, and composition is ordinary composition of the natural transformations.

Specifically, a monad is a monoid in this category. Recalling the brief discussion of groups and monoids above, we have to make a slight adjustment to switch from that picture, where we modeled a monoid using a whole category, to one where the monoid is a localized structure within a larger category. Leaving some technical details for the supplemental materials, the essential ideas are all the same, but now the properties of a monoid object are enforced with special morphisms

(monoid multiplication)
(identity)

where is a designated “unit” object for which .

When we talk about monads, we talk about a monoid object in the category of endofunctors of –that is, a particular endofunctor with associated natural transformations and with being the trivial identity functor on . Here’s the high-level picture:

but of course it’s not very helpful. Something that may be a bit more helpful is to consider the actions of and on the object level. As discussed above, every natural transformation must have corresponding morphisms for every object in its base category. For , then, we have and , where . Leaving aside the question of what exactly the action of could look like, we can see from these signatures that acts to “undo” an application of , while will simply apply to . Moreover, we require that these operations be defined for all . is usually referred to as a “joining” or “flat-map” operation while is a “lift” or “return” operation for reasons that should become clear below⁶.

But there’s only so much we can do from the purely mathematical context to gain intuition for what it all means–we now have the what of monads, but need some concrete applications to get the why.

Monads in programming

Programming is a particularly nice context for exploring monads. Generically, we can think of the type system of a programming language as a category where objects are types, morphisms are functions between those types, and composition is ordinary function composition. These categories are pretty much (since each type can be thought of as a set of values), but you can play around with them quite easily without any pen-and-paper work just by typing a few commands on a computer.

There are a couple restrictions to this, however. The first is that there are important constraints on the choice of language. Namely, we need a language that has strong types, can assign types to functions and allow functions to be passed as values to other functions, and allows programmers to create their own types that can be parameterized by another type. The second restriction is that, in order to properly constrain the space of possible functions, all functions must be completely pure–there can be no side effects, e.g. writing to files, printing to the screen, or changing the state of a random number generator, that do not have a corresponding encoding in a type.

An obvious choice of language here is Haskell. It’s already essentially just notation for category theory, but the fact that it still has to work as a programming language (i.e. describe a computation that can be run on a real computer) has introduced some interesting design choices that can provide some insight into the underlying math. For convenience, people generally refer to the category-of-types-and-pure-functions of Haskell as Hask⁷.

Parametric types

First, we need to talk about parametric types. As the name suggests, it’s a type that is parameterized by another type, which in Haskell looks like this:

data Foo a = -- constructors here...

Here, Foo is the parametric type and a is the type parameter. The constructors for this type are left undefined, but there can be any number (including zero!) that take any number of arguments⁸. Crucially, however, Doo is not actually a type. Instead, Foo on its own really names a particular family of types; only when a is specified (e.g. Foo String or Foo Int is it a “real,” concrete type. That concrete type can then be instantiated by calling one of its constructors on some input. It’s important to define what we mean by a “constructor,” though. Just based on the name, it’s easy to imagine a constructor as any function that takes some data and “constructs” the concrete type, e.g.

constructFoo :: {- some input data... -} -> Foo a

but for our purposes, we want to say that a constructor is a specific function whose only action is to “wrap” the input data in the parametric type. In other words, if we define a particular constructor called Make,

data Foo a = Make a
           | -- could be more...

then it can only construct a Foo a by trivially wrapping up the input a: We want to keep the action of turning an a into a Foo a separate from whatever could be done to the particular value passed to Make on the value level.

Anyway, the parametricity of Foo has a couple important implications. The first is that, as a construct that requires some type to produce a type, Foo in itself actually defines an object-to-object mapping. In other terms, Foo by itself has a type in Haskell that is expressible as

import Data.Kind (Type)

Foo :: Type -> Type

The second implication is that any function that takes a Foo a (as a generic object) as argument and returns a Foo b (for any type b), i.e.

bar :: Foo a -> Foo b

must define an action not just for any value the argument could be, like a regular function, but for any types that a and b could be as well.

A key inside here is that any parametric type together with any functions on it together actually define a functor in the proper category-theoretic sense⁹, since they define a complete mapping over both objects and morphisms, and we only consider “wrapper” constructors.

To shed more light on the idea of a “wrapper” constructor, let’s take a brief detour into some actual Haskell to see how it all works in practice. In principle, Foo A (now for concrete type A) is a proper type, like Int or String. But if Foo actually acted as a regular function between types, we’d start to lose some of the specialness of what functors do. Such a thing would also demand some kind of semantics be developed for performing non-trivial operations and logic on types themselves, which is a huge task. Instead, Haskell leaves the action of Foo on A as a kind of unevaluated atom: A Foo Int is simply a Foo Int, which is distinct from a regular Int or a Foo String. This translates to a near-identical treatment on the value level as well, when we call Make on a particular value–the result of calling Make 5 is simply Make 5:

-- `Foo` is a functor
Foo :: Type -> Type

-- `Foo` called on a type is a type
Foo Int :: Type

-- `Make` is a function that takes a value of type `a` and returns a value of
-- type `Foo a`
Make :: a -> Foo a

-- which means...
Make (5 :: Int) :: Foo Int

Make 5 is treated as a concrete value on its own since the constructor Make is assumed as a trivial action on the concrete value 5; it’s effectively a simple “wrapper” that crucially can be undone (or inverted) at will. This is distinct from a regular function that could do something non-trivial like perform some addition. If we wanted to do something non-trivial on 5 and return it as a Foo Int, we’d have to implement a separate function for it, e.g.

-- `makePlusOne` has the same type as `Make` for `a == Int`, but it isn't a pure
-- constructor.
makePlusOne :: Int -> Foo Int
makePlusOne val = Make (val + 1)

And just as we can “wrap” a value using a constructor, we can also “unwrap” it by pattern-matching, which acts as the inversion of the constructor:

-- bind the value `Make 5` to the variable `someFoo`
someFoo = Make (5 :: Int)

-- bind the value `5` to the variable `x` by unwrapping `someFoo`
Make x = someFoo

which also only works because the action of Make is trivial.

It’d be nice to be able to describe this concept in greater generality. While a function of type Foo a -> Foo b will in general entail an operation that will act in the proper functorial way¹⁰, every such function will have an implementation looking something like this:

bar :: Foo a -> Foo b
-- we pattern-match on the input to "unwrap" the inner value
bar (Make val) = Make val'
  where val' = -- some operation on `val`...

This is kind of annoying to write all the time, and we’d have to define a different function for every use of the inner value that follows this pattern. This leads us to the concept of a type class, which defines a class of types (duh) by their behavior under a certain set of functions. In haskell, we have a type class called Functor, which is defined roughly as follows:

class Functor (f :: Type -> Type) where
  fmap :: (a -> b) -> f a -> f b

This defines a Functor as any parametric type f :: Type -> Type that’s associated with a certain function called fmap to transform the inner type wrapped by f. To make it useful, fmap “factors out” the actions that can be performed on the inner value as a function a -> b that’s passed to fmap as an additional argument. fmap as a whole therefore has the type (a -> b) -> f a -> f b, meaning that it takes the function and a value wrapped by the functor, and returns the functor wrapped around the output of the action. We can then declare that our toy functor Foo meets this definition (is an “instance” of Functor) by implementing fmap:

instance Functor Foo where
  fmap func (Make val) = Make val'
    where val' = func val

But factoring func out of the definition comes with a consequence: Now we have to note the condition that any Functor instance must preserve composition and the identity. This condition is not feasibly expressible in code (the operations required to actually check that it holds for all uses of Functor are prohibitively expensive), so we can conventionally note as a “law” that, with Haskell’s composition operator . (the same thing as ) and the identity function id :: a -> a:

fmap id x == id x, and
(fmap f . fmap g) x == fmap (f . g) x.

Now coming back to the discussion of functors on Hask, another key insight is that any resulting concrete type or function instantiated whenever a concrete type is provided for a must be representable in the language (how would the language function otherwise!), which means it must also be a member of Hask. Therefore, every functor defined by a parametric type and its functions (Functor and fmap) is an endofunctor.

So then if parametric types are (endo)functors, what’s the equivalent of a natural transformation? The answer isn’t so simple. Rather, in a way it is, but we have to use a bit of imagination. The basic answer is that any ordinary function with generic parametric types as both inputs and outputs can act as a natural transformation. This can be seen by recalling the commutative diagram that defines naturality: Given Foo a and another parametric type Baz b, the function

foobaz :: Foo a -> Baz b

is effectively a mapping between (endo)functors. To be clear, not every function with parametric input and output types is a natural transformation–it must actually satisfy the commutative diagram–but a function will need to have a function signature like

h :: F a -> G b

where F and G are concrete Functors, in order to be a natural transformation.

Monads (again)

Now we have (endo)functors and natural transformation; the only thing left to discuss is how to actually define monads with Haskell constructs. In principle, the thing to do would be to define a Monoid type class, but it turns out we don’t really need one–after all, a monad is a specific kind of endofunctor monoid, so we can just skip to defining a Monad type class instead.

Recall that a monad is essentially an endofunctor along with two special natural transformations and . There are two things here that need to be translated to Haskell. The first is the identity endofunctor, which simply takes every object or morphism in Hask, and returns it unmodified. In Haskell, we’d therefore want a parametric type (call it Id) with only a single constructor and a simple implementation of Functor. This ends up being exactly what we’ve already done for Foo above:

data Id a = Id a -- `Id` is the name for both the type and the constructor

instance Functor Id where
  fmap func (Id val) = Id (func val)

But of course this is unnecessary. Since the point of the identity endofunctor is to do nothing, we’ll do exactly that and just say that any type a is exactly the same as Id a and any function f :: a -> * is the same thing as fmap f (Id a) and save a bit of CPU time.

The second is . We already know that parametric types are in themselves functors that act on types by wrapping them and on morphisms with fmap. So a double application of some endofunctor M should look like a double-wrapping of some concrete type, and require two calls to fmap for an ordinary function. Thus we have, for any object/type /a and morphism/function /f :: a -> b:

translates to M (M a), and
translated to fmap (fmap f) :: M (M a) -> M (M b).

So in Haskell and translate to two functions (conventionally called return and join) that look like this:

return :: a -> M a

join :: M (M a) -> M a

And in a manner similar to fmap, join has its internal action “factored out” into an extra function argument, with the result called bind. The more proper definition goes like this:

class Functor m => Monad (m :: Type -> Type) where
  return :: a -> m a
  bind :: (a -> m b) -> m a -> m b

The Functor m => bit is called a “context” in Haskell and means that whatever follows for m is only valid in the context that m is also a Functor¹¹. This definition also comes with the laws that return be essentially the same as a constructor for m (i.e. not perform any non-trivial work on the argument), and that bind behave in a way that agrees with the monad coherence conditions. In Haskell, the requirements can be expressed loosely as:

return is an identity:
bind f (return x) == f x and bind return m == m
bind respects composition:
bind (\x -> bind g (f x)) m == bind g (bind f m)¹²

the satisfaction of which are again infeasible to actually check.

Examples

So here we’ll go through a few examples of monads in Haskell, and relate their implementations to the category theory above.

`Maybe`/`Either`

The simplest non-trivial example of a monad that people usually start with (and the title of this post) is Maybe. Here’s we’re grouping this together with another monad called Either for reasons that will shortly become clear.

The Maybe a type refers to values that are nullifiable–that is, it can either be an ordinary value of type a or some kind of null value in the case that things don’t go as planned. This is its definition:

data Maybe a = Just a
             | Nothing

Maybe has two constructors: Just, which is used for the case where there is a value, and Nothing, which models the null case. Its Functor and Monad instances are:

instance Functor Maybe where
  fmap f (Just x) = Just (f x) -- `Just` behaves like the identity functor
  fmap _ Nothing  = Nothing    -- can't do anything with a null object

instance Monad Maybe where
  return = Just             -- `return`ing a value is "just" wrapping it

  bind f (Just x) = f x     -- `f` takes the role of a nullifiable operation
  bind _ Nothing  = Nothing -- can't do anything with null

Either behaves similarly. Like the name suggests, an Either is one of two things, which are called Left and Right.

data Either l r = Right r
                | Left l

Notice that Either has two parameters, while we’re used to thinking of functors taking only one argument. This may seem odd, but recall that Haskell has currying, which means that we can treat Either as a functor of one argument by just supplying a type for the first, e.g.

Either :: Type -> Type -> Type

Either Int    :: Type -> Type
Either String :: Type -> Type
-- and so on...

So then like Maybe, we can declare Either l to be a Monad–as long as we make sure to supply the first type:

-- these instances are generic over `l`:

instance Functor (Either l) where
  fmap f (Right r) = Right (f r)
  fmap _ (Left l)  = Left l

instance Monad (Either l) where
  return = Right

  bind f (Right r) = f r
  bind _ (Left l)  = Left l

Look how similar the instance declarations are! We can see that Nothing and Left are just treated as default null values–the difference is only that Either allows some extra data to be attach to the null case, since Left takes an argument. This makes Either a common case for describing the return value of a computation that can fail (the common pneumonic is that Right indicates the correct or “right” value).

So what does this all mean? From the instances, we can see that Just/Right are essentially the identity functor on Hask: They’re only used to wrap values, and all functions are simply passed through for those variants. On the other hand, Nothing/Left act to essentially map all of Hask down to a single object, and turn all morphisms into the identity.

But the key point is that this behavior turns Maybe and Either into what are probably the most well-known and generally useful monads, which most new programming languages seem to be adopting. This is because bind ends up making for a particularly elegant way to handle chains of nullifiable or fallible computations. If we work with the operator version of bind, defined as

(>>=) :: Monad m => m a -> (a -> m b) -> m b
x >>= f = bind f x

then we can write code that “automatically” handles nulls and errors without any explicit logic:

-- `get` returns `Just` the element of the list at an index, or `Nothing` if the
-- index is out of bounds
get :: Int -> [a] -> Maybe a
get _   []         = Nothing
get 0   (h : _)    = Just h
get idx (h : rest) = get (idx - 1) rest

-- `divideSafe` divides one `Float` by another, returning `Just` if the divisor
-- is non-zero, otherwise `Nothing`
divideSafe :: Float -> Float -> Maybe Float
divideSafe num denom = if denom == 0.0
                   then Nothing
                   else Just (num / denom)

-- `Show` is a type class that converts values to `String`s and provides a
-- method `show`
toString :: Show a => a -> String
toString = show

-- assume we have a list of numbers
someData :: [Float]
someData = {- ... -}

-- example computation: divide 100 by the fifth element of `someData`, and
-- convert the result to a `String`
foo :: Maybe String
foo = get 4 someData          -- start with a value that could be null
      >>= (divideSafe 100.0)  -- `divideSafe 100.0` is a curried function
      >>= (return . toString) -- compose `Maybe`'s `return` and `toString`
                              -- `return . toString :: a -> Maybe String

In foo, >>=/bind is used to take a value that could be null, and apply two more operations to the wrapped value. At each step, we apply a function to the wrapped value (if it exists) that then returns another wrapped value to be fed into the next application of >>=. The definition of >>= on Maybe allows for each operation to fail and, more importantly, preserves the null value through the operational pipeline in the case that one is produced. For example, if someData were less than five items long, the initial get would return Nothing; if the fifth item in someData were zero, then divideSafe 100.0 would return Nothing. In either case, the final result of the computation would then also be Nothing, indicating that something went wrong while preserving the pipeline itself. If we wanted to track where the error came from, we could have get and divide return, for example, Either String _s instead, where the Left constructor would be used to hold a message describing the error.

`List`

Maybe/Either, in some ways, have become the quintessential monad. But lists–homogeneous, ordered, linear sequences of items–are another example that usually comes as a small surprise to people first learning about monads.

In Haskell, lists are usually written [a], but to conform to notation above, we can also write them as List a. It’s helpful to look at how they’re actually defined:

data List a = []           -- the empty list
            | a : (List a) -- a head element attached to a (possibly empty) body

This definition makes it a singly linked lists, but it’s worth noting that what follows works for basically all homogeneous, ordered, linear sequences (provided definitions of return and bind that amount to identical operations). More specifically, we can note that every List a is the empty list, or an a attached to the front of another List a–wait a second, this looks a lot like Maybe! Let’s explore this further…

We can think of a List as a container of data (that’s what it’s for, after all) in the same way as Just. Thus it makes sense to define fmap in the same way, except here we have the possibility that the non-empty variant of the list contains multiple elements:

instance Functor List where
  fmap _ []            = []                   -- like `Nothing`
  fmap f (head : rest) = f head : fmap f rest -- like `Just`

To get the typing to work out and maintain homogeneity in the list, we define fmap to apply f to all elements in the list, which we do recursively. This is essentially the only definition of fmap that (a) satisfies functoriality requirements, and (b) produces a valid output List.

But then how should we implement Monad? return is straightforward: Given a single a, the only operation that makes sense (given functoriality and monadicity requirements) is to return a single-element List containing that a. But then we have to contend with bind, which should apply a function of type a -> List b to every element of the list and return a single List b. The naive application of this function that behaves like fmap would produce a List (List b), so what is there to do that makes sense under monad laws?

The answer ends up being what is commonly called a “flat map” in some languages or “map + concat” in others. Specifically, the operation is to apply the a ->List b function to all elements of the initial list, and then “flatten” the resulting List (List b) value by appending all nested lists end-to-end. Translated to a Monad instance:

instance Monad List where
  return x = x : [] -- can also be written as `[x]`

  bind _ [] = []
  bind f (x0 : rest) = concat (f x0) (bind f rest)

-- `concat` takes two lists and sticks them end-to-end, preserving order:
-- concat [0, 1, 2] [3, 4, 5] == [0, 1, 2, 3, 4, 5]
concat :: List a -> List a -> List a
concat []          b = b
concat (a0 : rest) b = a0 : concat rest b

From here, it’s easy to see that these definitions satisfy the monad laws, so List is indeed a bona fide monad that actually works quite similarly to Maybe.

State

Maybe, Either, and List were fairly simple (but quite useful!) examples, so here’s a more complicated one. A preeminent goal of any pure, functional language (or corresponding mathematical model) is to be able to provide an elegant description of stateful computations. For example, suppose we have an algorithm that needs to mutate the elements of an array, or we’re trying to describe how a game evolves from moment to moment. In either case, we generically want to think about two things:

The “state” being managed by the computation, and
the actual output produced by it.

From the mathematical (and, by extension, Haskell-ian) perspective, this looks like a function that takes an initial state of type s and produces another state (also of type s) along with some output, which is of type a:

runState :: s -> (s, a)

In this picture, evolutions of the state then look like functions of both the state and the intermediate output values:

stepState :: ((s, a) -> (s, b)) -> (s, a) -> (s, b)
--           ^                     ^         ^
--           |                     |         `- output state/value
--           |                     `- input state-value
--           `- arbitrary stateful computation

We could allow the type of the state to change in the transformation, but in principle we should be able to define a single type that is capable of modeling all possible states.

This already looks somewhat monadic: It looks like we can take some unconstrained value, wrap it together with some additional data, and feed it through a series of (composable) transformations to produce a final state and output. Let’s see what we can come up with for our Functor and Monad instances.

data State s a = State (s, a)

-- like `Either`, `State` is functorial/monadic only in the second type
instance Functor (State s) where
  fmap f (State (state, value)) = State (state, f value)

instance Monad (State s) = where
  return x = State (defaultState, x)      -- problem: what is `defaultState`?
  bind f (State (state, value)) = f value -- problem: `f` should be able to use
                                          -- `state` to produce output

The problems with this approach were perhaps a bit subtle to have foreseen from the outset, but they’re made clear when we get to brass tacks. There are two of them. First, return doesn’t really work as it should; its type signature forces us to choose some default state when initially wrapping a value. This means that the first monad law won’t be satisfies, since it means we have to discard any state information when we pass return to bind. Second, bind doesn’t work as it should, since we can’t pass any state data to f; this means the resulting computations (when constrained to Monad methods) won’t actually be stateful, since they can’t depend on state data.

Instead, the solution (used in roughly the following form by an official library shifts perspective a bit. Rather than thinking about merely the end state and the output value of the computation, we actually want to think about the computation itself, which is to say the s -> (s, a) function as the base object:

data State s a = State (s -> (s, a))

This definition is perhaps a bit more complicated, but immediately solves the problem noted above with return: We don’t have to choose a default state because the value stored by State is inherently agnostic over all states (of a specific type). It also suggests that the more natural way to think about stateful computations is in terms of the methods to get from one state to another, rather than a particular sequence of states¹³. Now let’s see how we can get working Functor and Monad instances.

Here we’ll again think of State s a as a functorial or monadic only in the second type parameter, which means the functions we have to implement have the types

fmapState :: (a -> b) -> State s a -> State s b

returnState :: a -> State s a
bindState :: (a -> State s b) -> State s a -> State s b

Removing the indirection created by the State constructor, these are

fmapState' :: (a -> b) -> (s -> (s, a)) -> (s -> (s, b))

returnState' :: a -> (s -> (s, a))
bindState' :: (a -> (s -> (s, a))) -> (s -> (s, a)) -> (s -> (s, b))

Our new Functor instance looks pretty much the same as the old one, except that we have to apply the fmap argument through composition using . instead of directly applying it to a concrete value:

instance Functor (State s) where
  -- the action of `fmap` is to compose `f` with the onternal state processor
  -- function in a particular way
  fmap f (State proc) = State proc'
    where proc' = applySnd f . proc

-- this is essentially our old definition of `fmap`: apply `f` to only the
-- value, leaving the state `s` unchanged
applySnd :: (a -> b) -> (s, a) -> (s, b)
applySnd f (state, value) = (state, f value)

For Monad, return is again relatively simple–we just construct a function that combines an initial state with a value, leaving the state untouched. bind, however, is made a bit more complicated by the fact that we need to think in terms of composition instead of simple application.

-- remove the trivial indirection created by the `State` constructor to expose
-- the bare processor function
unwrapState :: State s a -> (s -> (s, a))
unwrapState (State proc) = proc

instance Monad (State s) where
  return x = State (\s -> (s, x)) -- do nothing to the state, just return `x`

  bind f (State proc) = State proc'
    where
      -- unwrap the output of `f` to get the bare transformation on `proc`
      f' = unwrapState . f        -- f' :: a -> s -> (s, b)
      -- define `proc'` as the correct composition of `f` with `proc`
      proc' s0 = (s2, val'')      -- proc' :: s -> (s, b)
        where
          -- define the intermediate state/value computed via `proc`
          (s1, val1) = proc s0    -- (s1, val1) :: (s, a)
          -- define the new output state/value computed via `f'`
          (s2, val2) = f' val1 s1 -- (s2, val2) :: (s, b)

In the end, this extra complication produces an actual monad! To see a quick usage example, check out the supplementary material.

Categorically, State does a bit more work than the previous two examples. As we can see from its definition, the action of State s as a functor is really to transform types into functions¹⁴. And as we’ve seen above, the transformation is rather non-trivial when it comes to how the transformed objects can be manipulated. This is broadly similar to the action of what’s known as a covariant hom-functor, formally a functor from a category to , where each object is mapped to the set of all morphisms pointing to it, and morphisms are mapped to functions between those sets. Discussion of these functors is well beyond the scope of this post, but the fact that State is still an endofunctor on Hask is a testament to the power of Haskell’s type system.

Intuition

So what insight can we gain from all this? The most basic thing to note is that monads are a way to wrap values in a context that can contain extra data. In the case of Maybe and Either, this data essentially a boolean flag (augmented with additional data that does not play an active role in monadic transformations in the case of Either) that encodes whether the value is valid. For List, this data is all the items that are being strung together, and in which order. And for State, it’s how to compute a value and evolve a state, given some initial state.

But there’s more to a monad than just this data–it’s the data in combination with return and bind, which give a framework for how to create an compose these wrapped values. The fact that this framework exists and can be formalized as such allows something to be said about how all this translates to impure computations, which is often thought of as a fundamental tension in computing theory. Further, a monad’s status as an endofunctor over a whole category–here, all of Hask– means that this framework is valid for any type that is describable in the Haskell language.

One interesting way to think about it is to imagine starting from the programmer’s perspective. Depending on the computation, we may want to augment some value with an extra bit of data, and we may want this behavior to be generically applicable to essentially any type that currently exists in the type system, and we want this to work as a general system that one opts into, just in case it doesn’t really apply to a certain use-case. This is a natural way to arrive at a parametric type: The parametric type itself encodes some kind of logical bounds on the data it encloses, and the type parameter allows it to be applied in different contexts.

Then, we’d want some ways to work with values wrapped in the new type. These will of course vary between exact use-cases, but it makes sense to want two basic things:

A method to wrap bare values, and
a method to perform operations on wrapped values within the wrapping context.

Unsurprisingly, these are the functions provided by the Functor and Monad type classes, but from this perspective it only just so happens to align with the ideas of endofunctors and monoids thereof–all the category theory building up to monads can be viewed simply as formalizations of how to work with wrapped values. It must be said, though, that the mathematical understanding of monads and the line of reasoning that thinks of them as burritos are still significantly different, mainly in their generality (as usual).

When we work with monads in Haskell, we generally require some way to get values or base types out of the monad (we used the “wrapper” value and type constructors to this purpose here), at least in the code that defines the base behavior of the monad, if not in the interfaces exposed to an end user. But in the mathematical treatment, this need not be the case–the base functor is free to be non-invertible and perform non-trivial work.

So it is genuinely true that all things claimed to be monads in Haskell are indeed monads in the mathematical sense, but one should keep in mind that there’s more to monads than simply wrapping data, however common and useful such actions are.

Semi-related remarks

I originally wanted to write this post to give code examples in both Haskell and Rust–since Haskell’s syntax is famously quite terse (as mentioned, it’s essentially mathematical notation anyway), I thought Rust’s relative verbosity could make the programming section somewhat easier to follow. But although Rust’s type system does allow for functors and monads in the manner I described for Haskell, it turns out that traits are just barely not flexible enough to describe the Functor and Monad type classes in full generality and in a directly analogous way to how Haskell does it.

Instead, I’ll just say that Option and Result are direct analogues of Maybe and Either, with Option::map/Result::map being the functorial fmap, and Option::Some/Result::Ok and Option::and_then/Result::and_then being the monadic return and bind. I’ll also link to the crate monadic, which provides a trait definition that works but isn’t very idiomatic, and this blog post, which makes a case for some extensions to the trait system that would allow a more idiomatic definition.

Alternatively, OCaml’s functors are kind of flexible enough to define monads. This would be done by declaring a module type like so:

module type Monad = sig
  (* define a parametric type `m` with type parameter `'a` *)
  type 'a m

  val return : 'a -> 'a m

  val (>>=) : 'a m -> ('a -> 'b m) -> 'b m
end

and, correspondingly, we’d have an awkwardly named Functor module type as well:

module type Functor = sig
  type 'a f

  val fmap : ('a -> 'b) -> 'a f -> 'b f
end

These module types are sort of analogous to type classes, but the way to actually use them is awkward for this sort of thing. If we wanted to just define some functions using some monad, we’d write something like this:

(* some operation(s) using an unspecified monad type belonging to a module `M`
   that has types and functions matching the `Monad` signature

   in order to leave things generic over the monad type, we need to have these
   definitions in what OCaml calls a "functor" -- essentially, a module of code
   that depends on another module *)
module MonadicOps (M: Monad) = struct
  (* this is a function that applies a list of functions to some monadically
     wrapped value *)
  let rec foldm (ops: ('a -> 'a M.m) list) (init: 'a M.m): 'a M.m =
    match ops with (* `list` is a linked list type with cons operator `::` *)
    | f :: rest -> foldm rest M.(init >>= f)
    | []        -> init
end

(* example monad *)
module Maybe = struct
  type 'a m =
    | Just of 'a
    | Nothing

  let return (val: 'a): 'a m = Just val

  let (>>=) (maybe: 'a m) (f: 'a -> 'b m): 'b m =
    match maybe with
    | Just a  -> f a
    | Nothing -> Nothing
end

(* in order to use `foldm` on a concrete `'a Maybe.t`, we need to supply a
   module to `MonadicOps` in order to instantiate a new module that has an
   operable definition of `foldm` *)
module MaybeMethods = MonadicOps (Maybe)

(* now we have a concrete instance of `foldm` that only works on `Maybe.t`s *)
let fold_maybe = MaybeMethods.foldm

This is rather awkward (especially for a blog post), and doesn’t allow one to properly define Monad in the context of Functor. To impose this constraint, we’d have to specify it in a separate interface file or at each call site of MonadicOps, since we can’t define module types with module parameters (I think–it actually might be impossible to fully express the proper relationship between Monad and Functor in code!).

Footnotes

↵ To be clear, we’re talking about everyday, “solve for “ algebra and not abstract algebra, which is the generalization of different sets of objects and operations on those objects.
↵ It’s a total ordering on the reals too (interestingly, not on IEEE floating-point numbers), but let’s keep things discrete.
↵ Technically, a diagram into .
↵ In particular, higher-kinded types are important for some more interesting bits from category theory.
↵ There are some special requirements for the kinds of categories used to this purpose, but it can be done.
↵ Technically, there are also some additional “coherence conditions” we need for monads centered around , , and , which are:
↵ It should be said that Haskell’s type system actually isn’t a category due to some technical details, but it’s good enough for all intents and purposes here.
↵ Parametric types can also have more than one type parameter, but we’ll keep it to one for now.
↵ Although not in the completely general category-theoretic sense since, in the same way that a type’s constructors must be simple wrappers, Foo as a type constructor must be a simple wrapper around its type parameter(s). See the following paragraph.
↵ Here it’s pretty much guaranteed due to the simple structure we’ve limited Foo to, but as types get more complicated, this will not be the case.
↵ The “actual” definition of Monad in Haskell’s standard library has bind replaced by an operator >>= with the operands flipped in order to allow multiple bind called to be strung together cleanly, and the Functor context replaced by an Applicative context (which itself is defined in a Functor context).
↵ If you’re unfamiliar, the (\_ -> _) syntax defines an anonymous “lambda” function that takes some input argument following the \ and returns the expression following the ->.
↵ It also means that State is a bit of a misnomer. It isn’t really a state in itself; it’s actually more of a state processor.
↵ This is kind of a weird thing to say in a category-theoretic context since it seems at first like we have a functor that maps objects to morphisms, which is not allowed. But the resolution is that Hask includes function types, which blurs the difference between the two in some ways (e.g. a -> b is a function morphism, but can also be represented as a type/object in its own right).

Changelog

Originally posted 2024.07.31
2024.08.02: Rephrasing in the last paragraph before the categorical monad section.
2024.08.04: Correction a couple dumb typos and add more descriptive signatures in the Either and State discussions.
Updated in Typst reformat 2026.03.20 with adjustments to certain explanations and figures

Some category theory

Categories

What’s in an abstraction?

Defining a category

Functors

Natural transformations

Monads

Monads in programming

Parametric types

Monads (again)

Examples

`Maybe`/`Either`

`List`

State

Intuition

Semi-related remarks

Footnotes

Changelog

Some category theory

Categories

What’s in an abstraction?

Defining a category

Functors

Natural transformations

Monads

Monads in programming

Parametric types

Monads (again)

Examples

Maybe/Either

List

State

Intuition

Semi-related remarks

Footnotes

Changelog

`Maybe`/`Either`

`List`