Maybe from first principles
2024.07.31
Monads have become something often discussed by functional programmers and lauded as elegant ways to model errors and side effects when such things are formally nonexistent in a language. This is true, but the answer to the question of what a monad actually is remains famously opaque to newcomers. One can usually end up with a pretty good intuitive feeling for what a monad is just by using any if their various implementations, but it’s significantly more difficult to understand it from the true mathematical definition.
Let’s try it anyway. Here I’ll first provide the necessary background to understand monads in the mathematical sense, then map those concept to Haskell with a few illustrative examples. Finally, I’ll attempt to give a more intuitive explanation of monads in terms of what the can actually do for us.
What follows assumes a passing familiarity with the basic elements of Haskell and functional programming patterns (i.e. ML-style syntax, first-class function, currying, and recursion). Familiarity with mathematical phrasing and logical quantifiers (e.g. “for all “) is also assumed.
See also the supplementary material.
Some category theory
Despite its famous opaqueness, let’s start off with the mathematical definition which will, if nothing else, give us some milestones to aim for in concept space.
A monad is a monoid in the category of endofunctors of a fixed category.
The aforementioned opaqueness is apparent. But right off the bat, we have out milestone questions:
- What’s a category?
- What’s an endofunctor?
- What’s a category of endofunctors?
- What’s a monoid?
In answering these, we’ll undoubtedly accrue additional sub-questions, but these are useful high-level guides to what must follow.
Categories
Let’s look at the first. What is a category? To actually answer this in a satisfying way, it’s necessary to start with some motivation. Specifically, we want to think about the concept of an abstraction. What do I mean by this? There are a few ways to look at it, but I think maybe the best here is with a couple examples.
What’s in an abstraction?
Suppose we have some pebbles. A natural thing that any child will do is to try grouping them together in various ways. Given a collection of pebbles, it’s possible to separate them into piles, and likewise take some collection of piles and combine them in various pairings. In this process, one seeks a way to distinguish piles–what makes one pile different from another? Maybe there are a few pebbles in one pile that are shinier than those in another, but for the moment let’s assume all the pebbles are just boring, small rocks. Absent any sort of inherent specialness, one notices that the piles of pebbles can be characterized by counting each pebble in the pile. When two piles are combines into one, counting each of the pebbles in the final pile produces a result that has a specific relationship to what one obtains by counting the pebbles in the original piles.
And of course, as one notices, this phenomenon doesn’t only happen with pebbles. It could be seeds, candies, potatoes, whatever–as long as we’re only counting these properties hold for any kind of discrete thing! And just like that, we’ve discovered the natural numbers and addition. The natural numbers are an abstraction over discrete physical quantities: It doesn’t matter what the objects being counted are; as long as they’re discrete and countable, they behave like the natural numbers with addition.
Besides being more convenient to work with, numbers provide an interface to the pure concepts at play here. Using numbers, one can see the various additive properties of physical objects and even extend them. Subtraction and multiplication can be seen as direct logical extensions of addition, and division comes about by asking what would happen if the objects could be reduced in some way into smaller pieces. From there, it’s natural to then define other types of numbers like the negative integers, the rational numbers, the real numbers, complex numbers, quaternions, …
And further, abstraction can also be built on other abstractions. Algebra1 is an abstraction over numbers. Instead of dealing directly with the arithmetic operations on numerical values, we can notice patterns in how these operations affect their operands and use those patterns to write generalized statements that all children learn as equations: uses symbols to denote spots that could be filled by any number, and the usual rules for manipulating these symbols to “solve for “ are derived by analyzing observable patterns in how addition, subtraction, multiplication, and division operate on the numbers those symbols stand for. Then one can defined other operations on numbers based on these generalized, symbolic operations like integration and differentiation, and hopefully you (dear reader) can see that there’s a real power to be found in abstraction.
Group theory is another, more interesting example of an abstraction built on a physical phenomenon. Without getting into it too much, group theory is based on not quantities of objects (necessarily), but instead transformations that leave some kind of observable thing unchanged. Suppose we have a square lying into the plan. There are several things we can do to the square, like stretch it, move it around, rotate it, flip it around, etc. But there are certain transformations we can apply that leave the square looking the same. These are the symmetries of the square:
- Rotate by +90°
- Rotate by –90°
- Rotate by 180°
- Reflect across the axis
- Reflect across the axis
- Reflect across the line
- Reflect across the line
which look like this:
And the same sort of thing applies to other shapes, like hexagons and circles, and in other dimensions, like cubes and pyramids. But it also applied to other things for which “symmetry” takes on a more general definition, like how stepping forward or backward in time by a multiple of 12 hours will leave a clock’s hands in the same positions, or how the molecular structure of a crystal is the same under certain translations in space.
Group theory abstracts over these transformations and treats them as base objects: It doesn’t matter what they are or what the thing is that they’re being applied to–given a collection of things that satisfy certain conditions (that are also satisfied by symmetrical transformations), group theory reasons about and extends the properties of the collection to find non-trivial deductions in the same way that algebra and calculus reason about and extend the various properties of numbers, arithmetic, and functions. This abstraction in particular has been quite important in various fields of physics in the past several decades.
Defining a category
So now that I’ve thoroughly belabored the idea of an abstraction, we finally come to categories. Broadly, categories are abstractions over the concept of things having directed, composable relationships. What does this mean? Well, let’s briefly come back to numbers. Instead of arithmetic, though, we’ll talk about . In mathematical terms, is a total ordering on the integers2, meaning that under , any two integers and can always be compared, giving and if and only if . This ordering can be seen as a directed relationship between and . is also transitive: If and , then we must also have that . Another way to say this is that the relationships imposed on are composable: The relationships and can be “put together” to get another relationship .
These are examples of the core ideas of a category. Here’s the mathematical definition.
A category is a collection of object (things) , a collection of morphisms (relationships) , and a binary operation (composition) taking two morphisms as input and returning another morphism, such that the following hold for any object and morphisms :
The composition of two morphisms in a category is another morphism in the category,
Composition is associative,
Each object has an “identity” morphism, , that does nothing with respect to composition,
(It should also be said that morphisms are usually thought of as “arrows” pointing from one object to another. As a matter of notation, the morphism pointing from to is written , and another morphisms can be composed with , i.e. , if and only if and are the same.)
The example I gave above with is an example of a category, where objects are integers, morphisms are the ordering imposed by , and represents the transitivity of . Categories can also be drawn as directed graphs, with objects as nodes and morphisms as arrows:
But what does it all mean intuitively? Well, most of the point of category theory is to provide a very general framework for reasoning about the relationships encoded all the morphisms. The list of conditions that need to be satisfied in order for something to be a category is relatively short and lax, and as a result the language of categories can be applied in many contexts. This allows category theorists to, for example, prove something in one domain, and very easily map it to another. That sort of thing doesn’t really apply here to this post, though, since we mostly want to focus on category theory only as it pertains to programming.
So the first real thing to note is that although on the integers is a valid category, we more often think of objects as having some internal structure. That structure is then probed by the existence of the various morphisms that connect the objects, and the behavior that can emerge when different morphisms are composed together. A common example of a category is , the category of sets and functions (some more well-studied categories have names, typically set in bold). In , objects are sets, morphisms are functions in the usual sense, and is ordinary function composition where we take the output of one function and feed it as input to another.
Here we have a (sub)category of 3 with three objects (sets) , , and , and morphisms , , and , along with identity morphisms. From there, we can reason about these sets based on the morphisms between them. For example, we can think about a category of sets in which there exists one object corresponding to the empty set, and another object corresponding to the singleton set, which has only one element. These two sets are indeed characterized by the properties by which we’ve defined them. We know by construction what these sets are, but we can also see these characteristic properties based on what morphisms exist between them and other sets in the category. Namely, for any other object (set) , there exists exactly one unique morphism , and exactly one unique morphism . This identifies as an “initial” object and as a “terminal” object in the category. By translating set-theoretic ideas to a more general, category-based framework, the ideas themselves can be reasoned about more generally and applied to other areas. It can be shown that any other sets (or completely different kinds of objects) obeying either of these properties must have the respective number of elements (or equivalent property); hence the morphisms in the category characterize them.
This is a general pattern in category theory: We don’t talk explicitly about any structures within objects; instead, we rely on the morphisms to elucidate information. This keeps the reasoning as general as possible so that if, for instance, we can prove something non-trivial about initial or terminal objects, and we have some category for which there also exists an initial or terminal object, then we could immediately use what we’ve proved to say something about the objects in . Moreover, because we’d be working on so general a level of reasoning, the fact that such a relationship exists between and at all could then suggest something interesting about or something analogous to empty or singleton sets.
To continue with other categories, we can also have , where objects are vector spaces, morphisms are linear transformations, and is the composition of these transformations which in some cases corresponds to matrix multiplication. Or we could have , where objects are topological spaces, morphisms are continuous maps, and is ordinary function composition. Or we could have a category of logical statements, where objects are propositions, morphisms are implications, and is the transitivity of implication.
But the kind of category we want to consider is the category of a programming language’s type system. This category is basically , since types can be seen as sets of values, and the language’s functions and composition act basically the same as ordinary functions and composition. But if the language is pure (i.e. no mutations, side effects, or exceptions), then this category is also related to the category of logical statements and implications through the Curry-Howard correspondence. And if the language allows for user-defined types4, then the category of the type system takes on some additional properties that we’ll get into down below.
Functors
So then what are endofunctors and categories of endofunctors? As you can probably guess from the section header, this really translates to a question of what functors are, since–from the name–endofunctors are actually a special case of functors.
So what are functors? Simply put, a functor is a map from one category to another; it maps objects to objects and morphisms to morphisms, with the constraint that it preserves identity morphisms and composition. More precisely: For all objects and morphisms , the functor must satisfy
Functors are conceptually very important in the study of categories. To give just a single example, we can look at what happens when we interpret a group as a category. First, we’ll quickly define a group:
A group is a set and a binary operation on the elements of , subject to the following constraints for all :
is closed under ,
is associative,
has an identity element under , i.e. there exists some such that, for any ,
Every element in has an inverse under , i.e. for every , there exists some such that
So right away, we can make some comparisons between categories and groups. The first three constraints look an awful lot like those in the definition of a category. The elements of a group under are like the morphisms of the Hom-set of a category under . We require that both operations
- combine two elements of their domain to return another element of that domain (closure),
- product the same output no matter the order of application (associativity), and
- have certain elements on which their action is trivial (identities).
On the other hand, groups only have one kind of thing (the group elements), while categories have two (objects and morphisms), and groups have an additional constraint that each element be invertible. The similarity is there, though, and I’ll just cut to the chase. Any group can be interpreted as a category with a single object (call it ), where each morphism is an element of the group and composition of morphisms is the group operation.
An important thing to note is that every morphism in this one-object category is an isomorphism, meaning that it’s invertible, thanks to the fourth-constraint we plaed on groups above. If we relaxed taht constraint and allowed non-invertible morphisms/group elements, then we’d have a monoid (referring in a category-theoretic sense to the single object and its morphisms, and not necessarily the whole category), which is another important concept on the way to monads.
And here with both groups and monoids, we have the concept of a (group/monoid) homomorphism, which is a “structure-preserving” map between groups. Specifically, a group homomorphism maps the elements of one group to another group in a way that preserves products, identities, and inverses. That is, all the key properties of a group:
- For all , .
- For identities and , .
- For all , .
(And a monoid homomorphism does the same except for the last condition.)
Notice how, like when we interpreted a group as a category, the first two conditions are essentially those of functors, while the last is specific to groups (i.e. coming from the fact that group elements are required to be invertible). This means we can think of groups as special categories with some internal structured modeled by their morphisms, and group homomorphisms as special functors that preserve that structure while transforming between groups. But notice also that his is essentially how we think of morphisms between objects. This leads us to another special category called , the category of groups and group homomorphisms. Here, is interesting because it’s also a category that can be thought of as a kind of “category of categories”–each group/object is itself also a category, and each morphism can be thought of as a functor between the object categories.
This is a general pattern that can be repeated for other kinds of categories, too. Instead of only thinking about group-like categories, we could also imagine other categories modeled after different mathematical objects like sets or topological spaces. In this case, functors would be very powerful indeed since they would describe transformations between different areas of math–intuitively, functors can be seen as a formalization of the analogy.
Natural transformations
Consider that, in the process of constructing a functor, one must effectively define a mapping on morphisms, which are themselves usually some kind of mapping. And, given a suitable set of categories and a suitable set of functors between them, one can then construct a category of categories5. A reasonable question to then ask is, “what happens when we try to construct a mapping between functors?”
This kind of mapping is called a natural transformation. Specifically, natural transformations are maps from functors to other functors with the same input and output categories that preserve the structure of those functors (which themselves preserve the structure of the underlying categories). More precisely, any natural transformation between functors and must assign to every object a morphism and . The condition that a natural transformation preserves structure translates to the condition that, given a morphism in , applying to and then is the same as applying on and then .
This can be drawn as a “commutative diagram,” where any two paths formed by composition of morphisms with the same starting and ending objects are equivalent.
If, for every object , the morphism is an isomorphism–that is, if it is invertible–then is called a “natural isomorphism.”
Just as with categories and functors, it is also sometimes possible to construct categories of functors and natural transformations, where we then think of functors as objects, and their natural transformations as morphisms. The last piece to the monad puzzle is the concept of an endofunctor, which is simply a functor that points from a given category back to itself.
Monads
So finally we come to monads. Now that we know what categories, (endo)functors, and monoids are, it’s time to put it all together. Here’s the definition again:
A monad is a monoid in the category of endofunctors of a fixed category.
We’ll start with this fixed category, calling it , which is left entirely unconstrained. Here is treated as a kind of backdrop, and we work in the category of its endofunctor, which we’ll call . In , the objects are endofunctors on , morphisms are natural transformations between them, and composition is ordinary composition of the natural transformations.
Specifically, a monad is a monoid in this category. Recalling the brief discussion of groups and monoids above, we have to make a slight adjustment to switch from that picture, where we modeled a monoid using a whole category, to one where the monoid is a localized structure within a larger category. Leaving some technical details for the supplemental materials, the essential ideas are all the same, but now the properties of a monoid object are enforced with special morphisms
- (monoid multiplication)
- (identity)
where is a designated “unit” object for which .
When we talk about monads, we talk about a monoid object in the category of endofunctors of –that is, a particular endofunctor with associated natural transformations and with being the trivial identity functor on . Here’s the high-level picture:
but of course it’s not very helpful. Something that may be a bit more helpful is to consider the actions of and on the object level. As discussed above, every natural transformation must have corresponding morphisms for every object in its base category. For , then, we have and , where . Leaving aside the question of what exactly the action of could look like, we can see from these signatures that acts to “undo” an application of , while will simply apply to . Moreover, we require that these operations be defined for all . is usually referred to as a “joining” or “flat-map” operation while is a “lift” or “return” operation for reasons that should become clear below6.
But there’s only so much we can do from the purely mathematical context to gain intuition for what it all means–we now have the what of monads, but need some concrete applications to get the why.
Monads in programming
Programming is a particularly nice context for exploring monads. Generically, we can think of the type system of a programming language as a category where objects are types, morphisms are functions between those types, and composition is ordinary function composition. These categories are pretty much (since each type can be thought of as a set of values), but you can play around with them quite easily without any pen-and-paper work just by typing a few commands on a computer.
There are a couple restrictions to this, however. The first is that there are important constraints on the choice of language. Namely, we need a language that has strong types, can assign types to functions and allow functions to be passed as values to other functions, and allows programmers to create their own types that can be parameterized by another type. The second restriction is that, in order to properly constrain the space of possible functions, all functions must be completely pure–there can be no side effects, e.g. writing to files, printing to the screen, or changing the state of a random number generator, that do not have a corresponding encoding in a type.
An obvious choice of language here is Haskell. It’s already essentially just notation for category theory, but the fact that it still has to work as a programming language (i.e. describe a computation that can be run on a real computer) has introduced some interesting design choices that can provide some insight into the underlying math. For convenience, people generally refer to the category-of-types-and-pure-functions of Haskell as Hask7.
Parametric types
First, we need to talk about parametric types. As the name suggests, it’s a type that is parameterized by another type, which in Haskell looks like this:
data Foo a = -- constructors here...
Here, Foo is the parametric type and a is the type parameter. The constructors for this type are left undefined, but there can be any number (including zero!) that take any number of arguments8. Crucially, however, Doo is not actually a type. Instead, Foo on its own really names a particular family of types; only when a is specified (e.g. Foo String or Foo Int is it a “real,” concrete type. That concrete type can then be instantiated by calling one of its constructors on some input. It’s important to define what we mean by a “constructor,” though. Just based on the name, it’s easy to imagine a constructor as any function that takes some data and “constructs” the concrete type, e.g.
constructFoo :: {- some input data... -} -> Foo a
but for our purposes, we want to say that a constructor is a specific function whose only action is to “wrap” the input data in the parametric type. In other words, if we define a particular constructor called Make,
data Foo a = Make a
| -- could be more...
then it can only construct a Foo a by trivially wrapping up the input a: We want to keep the action of turning an a into a Foo a separate from whatever could be done to the particular value passed to Make on the value level.
Anyway, the parametricity of Foo has a couple important implications. The first is that, as a construct that requires some type to produce a type, Foo in itself actually defines an object-to-object mapping. In other terms, Foo by itself has a type in Haskell that is expressible as
import Data.Kind (Type)
Foo :: Type -> Type
The second implication is that any function that takes a Foo a (as a generic object) as argument and returns a Foo b (for any type b), i.e.
bar :: Foo a -> Foo b
must define an action not just for any value the argument could be, like a regular function, but for any types that a and b could be as well.
A key inside here is that any parametric type together with any functions on it together actually define a functor in the proper category-theoretic sense9, since they define a complete mapping over both objects and morphisms, and we only consider “wrapper” constructors.
To shed more light on the idea of a “wrapper” constructor, let’s take a brief detour into some actual Haskell to see how it all works in practice. In principle, Foo A (now for concrete type A) is a proper type, like Int or String. But if Foo actually acted as a regular function between types, we’d start to lose some of the specialness of what functors do. Such a thing would also demand some kind of semantics be developed for performing non-trivial operations and logic on types themselves, which is a huge task. Instead, Haskell leaves the action of Foo on A as a kind of unevaluated atom: A Foo Int is simply a Foo Int, which is distinct from a regular Int or a Foo String. This translates to a near-identical treatment on the value level as well, when we call Make on a particular value–the result of calling Make 5 is simply Make 5:
-- `Foo` is a functor
Foo :: Type -> Type
-- `Foo` called on a type is a type
Foo Int :: Type
-- `Make` is a function that takes a value of type `a` and returns a value of
-- type `Foo a`
Make :: a -> Foo a
-- which means...
Make (5 :: Int) :: Foo Int
Make 5 is treated as a concrete value on its own since the constructor Make is assumed as a trivial action on the concrete value 5; it’s effectively a simple “wrapper” that crucially can be undone (or inverted) at will. This is distinct from a regular function that could do something non-trivial like perform some addition. If we wanted to do something non-trivial on 5 and return it as a Foo Int, we’d have to implement a separate function for it, e.g.
-- `makePlusOne` has the same type as `Make` for `a == Int`, but it isn't a pure
-- constructor.
makePlusOne :: Int -> Foo Int
makePlusOne val = Make (val + 1)
And just as we can “wrap” a value using a constructor, we can also “unwrap” it by pattern-matching, which acts as the inversion of the constructor:
-- bind the value `Make 5` to the variable `someFoo`
someFoo = Make (5 :: Int)
-- bind the value `5` to the variable `x` by unwrapping `someFoo`
Make x = someFoo
which also only works because the action of Make is trivial.
It’d be nice to be able to describe this concept in greater generality. While a function of type Foo a -> Foo b will in general entail an operation that will act in the proper functorial way10, every such function will have an implementation looking something like this:
bar :: Foo a -> Foo b
-- we pattern-match on the input to "unwrap" the inner value
bar (Make val) = Make val'
where val' = -- some operation on `val`...
This is kind of annoying to write all the time, and we’d have to define a different function for every use of the inner value that follows this pattern. This leads us to the concept of a type class, which defines a class of types (duh) by their behavior under a certain set of functions. In haskell, we have a type class called Functor, which is defined roughly as follows:
class Functor (f :: Type -> Type) where
fmap :: (a -> b) -> f a -> f b
This defines a Functor as any parametric type f :: Type -> Type that’s associated with a certain function called fmap to transform the inner type wrapped by f. To make it useful, fmap “factors out” the actions that can be performed on the inner value as a function a -> b that’s passed to fmap as an additional argument. fmap as a whole therefore has the type (a -> b) -> f a -> f b, meaning that it takes the function and a value wrapped by the functor, and returns the functor wrapped around the output of the action. We can then declare that our toy functor Foo meets this definition (is an “instance” of Functor) by implementing fmap:
instance Functor Foo where
fmap func (Make val) = Make val'
where val' = func val
But factoring func out of the definition comes with a consequence: Now we have to note the condition that any Functor instance must preserve composition and the identity. This condition is not feasibly expressible in code (the operations required to actually check that it holds for all uses of Functor are prohibitively expensive), so we can conventionally note as a “law” that, with Haskell’s composition operator . (the same thing as ) and the identity function id :: a -> a:
fmap id x == id x, and(fmap f . fmap g) x == fmap (f . g) x.
Now coming back to the discussion of functors on Hask, another key insight is that any resulting concrete type or function instantiated whenever a concrete type is provided for a must be representable in the language (how would the language function otherwise!), which means it must also be a member of Hask. Therefore, every functor defined by a parametric type and its functions (Functor and fmap) is an endofunctor.
So then if parametric types are (endo)functors, what’s the equivalent of a natural transformation? The answer isn’t so simple. Rather, in a way it is, but we have to use a bit of imagination. The basic answer is that any ordinary function with generic parametric types as both inputs and outputs can act as a natural transformation. This can be seen by recalling the commutative diagram that defines naturality: Given Foo a and another parametric type Baz b, the function
foobaz :: Foo a -> Baz b
is effectively a mapping between (endo)functors. To be clear, not every function with parametric input and output types is a natural transformation–it must actually satisfy the commutative diagram–but a function will need to have a function signature like
h :: F a -> G b
where F and G are concrete Functors, in order to be a natural transformation.
Monads (again)
Now we have (endo)functors and natural transformation; the only thing left to discuss is how to actually define monads with Haskell constructs. In principle, the thing to do would be to define a Monoid type class, but it turns out we don’t really need one–after all, a monad is a specific kind of endofunctor monoid, so we can just skip to defining a Monad type class instead.
Recall that a monad is essentially an endofunctor along with two special natural transformations and . There are two things here that need to be translated to Haskell. The first is the identity endofunctor, which simply takes every object or morphism in Hask, and returns it unmodified. In Haskell, we’d therefore want a parametric type (call it Id) with only a single constructor and a simple implementation of Functor. This ends up being exactly what we’ve already done for Foo above:
data Id a = Id a -- `Id` is the name for both the type and the constructor
instance Functor Id where
fmap func (Id val) = Id (func val)
But of course this is unnecessary. Since the point of the identity endofunctor is to do nothing, we’ll do exactly that and just say that any type a is exactly the same as Id a and any function f :: a -> * is the same thing as fmap f (Id a) and save a bit of CPU time.
The second is . We already know that parametric types are in themselves functors that act on types by wrapping them and on morphisms with fmap. So a double application of some endofunctor M should look like a double-wrapping of some concrete type, and require two calls to fmap for an ordinary function. Thus we have, for any object/type /a and morphism/function /f :: a -> b:
- translates to
M (M a), and - translated to
fmap (fmap f) :: M (M a) -> M (M b).
So in Haskell and translate to two functions (conventionally called return and join) that look like this:
return :: a -> M a
join :: M (M a) -> M a
And in a manner similar to fmap, join has its internal action “factored out” into an extra function argument, with the result called bind. The more proper definition goes like this:
class Functor m => Monad (m :: Type -> Type) where
return :: a -> m a
bind :: (a -> m b) -> m a -> m b
The Functor m => bit is called a “context” in Haskell and means that whatever follows for m is only valid in the context that m is also a Functor11. This definition also comes with the laws that return be essentially the same as a constructor for m (i.e. not perform any non-trivial work on the argument), and that bind behave in a way that agrees with the monad coherence conditions. In Haskell, the requirements can be expressed loosely as:
returnis an identity:bind f (return x) == f xandbind return m == mbindrespects composition:bind (\x -> bind g (f x)) m == bind g (bind f m)12
the satisfaction of which are again infeasible to actually check.
Examples
So here we’ll go through a few examples of monads in Haskell, and relate their implementations to the category theory above.
Maybe/Either
The simplest non-trivial example of a monad that people usually start with (and the title of this post) is Maybe. Here’s we’re grouping this together with another monad called Either for reasons that will shortly become clear.
The Maybe a type refers to values that are nullifiable–that is, it can either be an ordinary value of type a or some kind of null value in the case that things don’t go as planned. This is its definition:
data Maybe a = Just a
| Nothing
Maybe has two constructors: Just, which is used for the case where there is a value, and Nothing, which models the null case. Its Functor and Monad instances are:
instance Functor Maybe where
fmap f (Just x) = Just (f x) -- `Just` behaves like the identity functor
fmap _ Nothing = Nothing -- can't do anything with a null object
instance Monad Maybe where
return = Just -- `return`ing a value is "just" wrapping it
bind f (Just x) = f x -- `f` takes the role of a nullifiable operation
bind _ Nothing = Nothing -- can't do anything with null
Either behaves similarly. Like the name suggests, an Either is one of two things, which are called Left and Right.
data Either l r = Right r
| Left l
Notice that Either has two parameters, while we’re used to thinking of functors taking only one argument. This may seem odd, but recall that Haskell has currying, which means that we can treat Either as a functor of one argument by just supplying a type for the first, e.g.
Either :: Type -> Type -> Type
Either Int :: Type -> Type
Either String :: Type -> Type
-- and so on...
So then like Maybe, we can declare Either l to be a Monad–as long as we make sure to supply the first type:
-- these instances are generic over `l`:
instance Functor (Either l) where
fmap f (Right r) = Right (f r)
fmap _ (Left l) = Left l
instance Monad (Either l) where
return = Right
bind f (Right r) = f r
bind _ (Left l) = Left l
Look how similar the instance declarations are! We can see that Nothing and Left are just treated as default null values–the difference is only that Either allows some extra data to be attach to the null case, since Left takes an argument. This makes Either a common case for describing the return value of a computation that can fail (the common pneumonic is that Right indicates the correct or “right” value).
So what does this all mean? From the instances, we can see that Just/Right are essentially the identity functor on Hask: They’re only used to wrap values, and all functions are simply passed through for those variants. On the other hand, Nothing/Left act to essentially map all of Hask down to a single object, and turn all morphisms into the identity.
But the key point is that this behavior turns Maybe and Either into what are probably the most well-known and generally useful monads, which most new programming languages seem to be adopting. This is because bind ends up making for a particularly elegant way to handle chains of nullifiable or fallible computations. If we work with the operator version of bind, defined as
(>>=) :: Monad m => m a -> (a -> m b) -> m b
x >>= f = bind f x
then we can write code that “automatically” handles nulls and errors without any explicit logic:
-- `get` returns `Just` the element of the list at an index, or `Nothing` if the
-- index is out of bounds
get :: Int -> [a] -> Maybe a
get _ [] = Nothing
get 0 (h : _) = Just h
get idx (h : rest) = get (idx - 1) rest
-- `divideSafe` divides one `Float` by another, returning `Just` if the divisor
-- is non-zero, otherwise `Nothing`
divideSafe :: Float -> Float -> Maybe Float
divideSafe num denom = if denom == 0.0
then Nothing
else Just (num / denom)
-- `Show` is a type class that converts values to `String`s and provides a
-- method `show`
toString :: Show a => a -> String
toString = show
-- assume we have a list of numbers
someData :: [Float]
someData = {- ... -}
-- example computation: divide 100 by the fifth element of `someData`, and
-- convert the result to a `String`
foo :: Maybe String
foo = get 4 someData -- start with a value that could be null
>>= (divideSafe 100.0) -- `divideSafe 100.0` is a curried function
>>= (return . toString) -- compose `Maybe`'s `return` and `toString`
-- `return . toString :: a -> Maybe String
In foo, >>=/bind is used to take a value that could be null, and apply two more operations to the wrapped value. At each step, we apply a function to the wrapped value (if it exists) that then returns another wrapped value to be fed into the next application of >>=. The definition of >>= on Maybe allows for each operation to fail and, more importantly, preserves the null value through the operational pipeline in the case that one is produced. For example, if someData were less than five items long, the initial get would return Nothing; if the fifth item in someData were zero, then divideSafe 100.0 would return Nothing. In either case, the final result of the computation would then also be Nothing, indicating that something went wrong while preserving the pipeline itself. If we wanted to track where the error came from, we could have get and divide return, for example, Either String _s instead, where the Left constructor would be used to hold a message describing the error.
List
Maybe/Either, in some ways, have become the quintessential monad. But lists–homogeneous, ordered, linear sequences of items–are another example that usually comes as a small surprise to people first learning about monads.
In Haskell, lists are usually written [a], but to conform to notation above, we can also write them as List a. It’s helpful to look at how they’re actually defined:
data List a = [] -- the empty list
| a : (List a) -- a head element attached to a (possibly empty) body
This definition makes it a singly linked lists, but it’s worth noting that what follows works for basically all homogeneous, ordered, linear sequences (provided definitions of return and bind that amount to identical operations). More specifically, we can note that every List a is the empty list, or an a attached to the front of another List a–wait a second, this looks a lot like Maybe! Let’s explore this further…
We can think of a List as a container of data (that’s what it’s for, after all) in the same way as Just. Thus it makes sense to define fmap in the same way, except here we have the possibility that the non-empty variant of the list contains multiple elements:
instance Functor List where
fmap _ [] = [] -- like `Nothing`
fmap f (head : rest) = f head : fmap f rest -- like `Just`
To get the typing to work out and maintain homogeneity in the list, we define fmap to apply f to all elements in the list, which we do recursively. This is essentially the only definition of fmap that (a) satisfies functoriality requirements, and (b) produces a valid output List.
But then how should we implement Monad? return is straightforward: Given a single a, the only operation that makes sense (given functoriality and monadicity requirements) is to return a single-element List containing that a. But then we have to contend with bind, which should apply a function of type a -> List b to every element of the list and return a single List b. The naive application of this function that behaves like fmap would produce a List (List b), so what is there to do that makes sense under monad laws?
The answer ends up being what is commonly called a “flat map” in some languages or “map + concat” in others. Specifically, the operation is to apply the a ->List b function to all elements of the initial list, and then “flatten” the resulting List (List b) value by appending all nested lists end-to-end. Translated to a Monad instance:
instance Monad List where
return x = x : [] -- can also be written as `[x]`
bind _ [] = []
bind f (x0 : rest) = concat (f x0) (bind f rest)
-- `concat` takes two lists and sticks them end-to-end, preserving order:
-- concat [0, 1, 2] [3, 4, 5] == [0, 1, 2, 3, 4, 5]
concat :: List a -> List a -> List a
concat [] b = b
concat (a0 : rest) b = a0 : concat rest b
From here, it’s easy to see that these definitions satisfy the monad laws, so List is indeed a bona fide monad that actually works quite similarly to Maybe.
State
Maybe, Either, and List were fairly simple (but quite useful!) examples, so here’s a more complicated one. A preeminent goal of any pure, functional language (or corresponding mathematical model) is to be able to provide an elegant description of stateful computations. For example, suppose we have an algorithm that needs to mutate the elements of an array, or we’re trying to describe how a game evolves from moment to moment. In either case, we generically want to think about two things:
- The “state” being managed by the computation, and
- the actual output produced by it.
From the mathematical (and, by extension, Haskell-ian) perspective, this looks like a function that takes an initial state of type s and produces another state (also of type s) along with some output, which is of type a:
runState :: s -> (s, a)
In this picture, evolutions of the state then look like functions of both the state and the intermediate output values:
stepState :: ((s, a) -> (s, b)) -> (s, a) -> (s, b)
-- ^ ^ ^
-- | | `- output state/value
-- | `- input state-value
-- `- arbitrary stateful computation
We could allow the type of the state to change in the transformation, but in principle we should be able to define a single type that is capable of modeling all possible states.
This already looks somewhat monadic: It looks like we can take some unconstrained value, wrap it together with some additional data, and feed it through a series of (composable) transformations to produce a final state and output. Let’s see what we can come up with for our Functor and Monad instances.
data State s a = State (s, a)
-- like `Either`, `State` is functorial/monadic only in the second type
instance Functor (State s) where
fmap f (State (state, value)) = State (state, f value)
instance Monad (State s) = where
return x = State (defaultState, x) -- problem: what is `defaultState`?
bind f (State (state, value)) = f value -- problem: `f` should be able to use
-- `state` to produce output
The problems with this approach were perhaps a bit subtle to have foreseen from the outset, but they’re made clear when we get to brass tacks. There are two of them. First, return doesn’t really work as it should; its type signature forces us to choose some default state when initially wrapping a value. This means that the first monad law won’t be satisfies, since it means we have to discard any state information when we pass return to bind. Second, bind doesn’t work as it should, since we can’t pass any state data to f; this means the resulting computations (when constrained to Monad methods) won’t actually be stateful, since they can’t depend on state data.
Instead, the solution (used in roughly the following form by an official library shifts perspective a bit. Rather than thinking about merely the end state and the output value of the computation, we actually want to think about the computation itself, which is to say the s -> (s, a) function as the base object:
data State s a = State (s -> (s, a))
This definition is perhaps a bit more complicated, but immediately solves the problem noted above with return: We don’t have to choose a default state because the value stored by State is inherently agnostic over all states (of a specific type). It also suggests that the more natural way to think about stateful computations is in terms of the methods to get from one state to another, rather than a particular sequence of states13. Now let’s see how we can get working Functor and Monad instances.
Here we’ll again think of State s a as a functorial or monadic only in the second type parameter, which means the functions we have to implement have the types
fmapState :: (a -> b) -> State s a -> State s b
returnState :: a -> State s a
bindState :: (a -> State s b) -> State s a -> State s b
Removing the indirection created by the State constructor, these are
fmapState' :: (a -> b) -> (s -> (s, a)) -> (s -> (s, b))
returnState' :: a -> (s -> (s, a))
bindState' :: (a -> (s -> (s, a))) -> (s -> (s, a)) -> (s -> (s, b))
Our new Functor instance looks pretty much the same as the old one, except that we have to apply the fmap argument through composition using . instead of directly applying it to a concrete value:
instance Functor (State s) where
-- the action of `fmap` is to compose `f` with the onternal state processor
-- function in a particular way
fmap f (State proc) = State proc'
where proc' = applySnd f . proc
-- this is essentially our old definition of `fmap`: apply `f` to only the
-- value, leaving the state `s` unchanged
applySnd :: (a -> b) -> (s, a) -> (s, b)
applySnd f (state, value) = (state, f value)
For Monad, return is again relatively simple–we just construct a function that combines an initial state with a value, leaving the state untouched. bind, however, is made a bit more complicated by the fact that we need to think in terms of composition instead of simple application.
-- remove the trivial indirection created by the `State` constructor to expose
-- the bare processor function
unwrapState :: State s a -> (s -> (s, a))
unwrapState (State proc) = proc
instance Monad (State s) where
return x = State (\s -> (s, x)) -- do nothing to the state, just return `x`
bind f (State proc) = State proc'
where
-- unwrap the output of `f` to get the bare transformation on `proc`
f' = unwrapState . f -- f' :: a -> s -> (s, b)
-- define `proc'` as the correct composition of `f` with `proc`
proc' s0 = (s2, val'') -- proc' :: s -> (s, b)
where
-- define the intermediate state/value computed via `proc`
(s1, val1) = proc s0 -- (s1, val1) :: (s, a)
-- define the new output state/value computed via `f'`
(s2, val2) = f' val1 s1 -- (s2, val2) :: (s, b)
In the end, this extra complication produces an actual monad! To see a quick usage example, check out the supplementary material.
Categorically, State does a bit more work than the previous two examples. As we can see from its definition, the action of State s as a functor is really to transform types into functions14. And as we’ve seen above, the transformation is rather non-trivial when it comes to how the transformed objects can be manipulated. This is broadly similar to the action of what’s known as a covariant hom-functor, formally a functor from a category to , where each object is mapped to the set of all morphisms pointing to it, and morphisms are mapped to functions between those sets. Discussion of these functors is well beyond the scope of this post, but the fact that State is still an endofunctor on Hask is a testament to the power of Haskell’s type system.
Intuition
So what insight can we gain from all this? The most basic thing to note is that monads are a way to wrap values in a context that can contain extra data. In the case of Maybe and Either, this data essentially a boolean flag (augmented with additional data that does not play an active role in monadic transformations in the case of Either) that encodes whether the value is valid. For List, this data is all the items that are being strung together, and in which order. And for State, it’s how to compute a value and evolve a state, given some initial state.
But there’s more to a monad than just this data–it’s the data in combination with return and bind, which give a framework for how to create an compose these wrapped values. The fact that this framework exists and can be formalized as such allows something to be said about how all this translates to impure computations, which is often thought of as a fundamental tension in computing theory. Further, a monad’s status as an endofunctor over a whole category–here, all of Hask– means that this framework is valid for any type that is describable in the Haskell language.
One interesting way to think about it is to imagine starting from the programmer’s perspective. Depending on the computation, we may want to augment some value with an extra bit of data, and we may want this behavior to be generically applicable to essentially any type that currently exists in the type system, and we want this to work as a general system that one opts into, just in case it doesn’t really apply to a certain use-case. This is a natural way to arrive at a parametric type: The parametric type itself encodes some kind of logical bounds on the data it encloses, and the type parameter allows it to be applied in different contexts.
Then, we’d want some ways to work with values wrapped in the new type. These will of course vary between exact use-cases, but it makes sense to want two basic things:
- A method to wrap bare values, and
- a method to perform operations on wrapped values within the wrapping context.
Unsurprisingly, these are the functions provided by the Functor and Monad type classes, but from this perspective it only just so happens to align with the ideas of endofunctors and monoids thereof–all the category theory building up to monads can be viewed simply as formalizations of how to work with wrapped values. It must be said, though, that the mathematical understanding of monads and the line of reasoning that thinks of them as burritos are still significantly different, mainly in their generality (as usual).
When we work with monads in Haskell, we generally require some way to get values or base types out of the monad (we used the “wrapper” value and type constructors to this purpose here), at least in the code that defines the base behavior of the monad, if not in the interfaces exposed to an end user. But in the mathematical treatment, this need not be the case–the base functor is free to be non-invertible and perform non-trivial work.
So it is genuinely true that all things claimed to be monads in Haskell are indeed monads in the mathematical sense, but one should keep in mind that there’s more to monads than simply wrapping data, however common and useful such actions are.
Semi-related remarks
I originally wanted to write this post to give code examples in both Haskell and Rust–since Haskell’s syntax is famously quite terse (as mentioned, it’s essentially mathematical notation anyway), I thought Rust’s relative verbosity could make the programming section somewhat easier to follow. But although Rust’s type system does allow for functors and monads in the manner I described for Haskell, it turns out that traits are just barely not flexible enough to describe the Functor and Monad type classes in full generality and in a directly analogous way to how Haskell does it.
Instead, I’ll just say that Option and Result are direct analogues of Maybe and Either, with Option::map/Result::map being the functorial fmap, and Option::Some/Result::Ok and Option::and_then/Result::and_then being the monadic return and bind. I’ll also link to the crate monadic, which provides a trait definition that works but isn’t very idiomatic, and this blog post, which makes a case for some extensions to the trait system that would allow a more idiomatic definition.
Alternatively, OCaml’s functors are kind of flexible enough to define monads. This would be done by declaring a module type like so:
module type Monad = sig
(* define a parametric type `m` with type parameter `'a` *)
type 'a m
val return : 'a -> 'a m
val (>>=) : 'a m -> ('a -> 'b m) -> 'b m
end
and, correspondingly, we’d have an awkwardly named Functor module type as well:
module type Functor = sig
type 'a f
val fmap : ('a -> 'b) -> 'a f -> 'b f
end
These module types are sort of analogous to type classes, but the way to actually use them is awkward for this sort of thing. If we wanted to just define some functions using some monad, we’d write something like this:
(* some operation(s) using an unspecified monad type belonging to a module `M`
that has types and functions matching the `Monad` signature
in order to leave things generic over the monad type, we need to have these
definitions in what OCaml calls a "functor" -- essentially, a module of code
that depends on another module *)
module MonadicOps (M: Monad) = struct
(* this is a function that applies a list of functions to some monadically
wrapped value *)
let rec foldm (ops: ('a -> 'a M.m) list) (init: 'a M.m): 'a M.m =
match ops with (* `list` is a linked list type with cons operator `::` *)
| f :: rest -> foldm rest M.(init >>= f)
| [] -> init
end
(* example monad *)
module Maybe = struct
type 'a m =
| Just of 'a
| Nothing
let return (val: 'a): 'a m = Just val
let (>>=) (maybe: 'a m) (f: 'a -> 'b m): 'b m =
match maybe with
| Just a -> f a
| Nothing -> Nothing
end
(* in order to use `foldm` on a concrete `'a Maybe.t`, we need to supply a
module to `MonadicOps` in order to instantiate a new module that has an
operable definition of `foldm` *)
module MaybeMethods = MonadicOps (Maybe)
(* now we have a concrete instance of `foldm` that only works on `Maybe.t`s *)
let fold_maybe = MaybeMethods.foldm
This is rather awkward (especially for a blog post), and doesn’t allow one to properly define Monad in the context of Functor. To impose this constraint, we’d have to specify it in a separate interface file or at each call site of MonadicOps, since we can’t define module types with module parameters (I think–it actually might be impossible to fully express the proper relationship between Monad and Functor in code!).
Footnotes
- ↵ To be clear, we’re talking about everyday, “solve for “ algebra and not abstract algebra, which is the generalization of different sets of objects and operations on those objects.
- ↵ It’s a total ordering on the reals too (interestingly, not on IEEE floating-point numbers), but let’s keep things discrete.
- ↵ Technically, a diagram into .
- ↵ In particular, higher-kinded types are important for some more interesting bits from category theory.
- ↵ There are some special requirements for the kinds of categories used to this purpose, but it can be done.
-
↵ Technically, there are also some additional “coherence conditions” we need for monads centered around , , and , which are:
- ↵ It should be said that Haskell’s type system actually isn’t a category due to some technical details, but it’s good enough for all intents and purposes here.
- ↵ Parametric types can also have more than one type parameter, but we’ll keep it to one for now.
- ↵ Although not in the completely general category-theoretic sense since, in the same way that a type’s constructors must be simple wrappers,
Fooas a type constructor must be a simple wrapper around its type parameter(s). See the following paragraph. - ↵ Here it’s pretty much guaranteed due to the simple structure we’ve limited
Footo, but as types get more complicated, this will not be the case. - ↵ The “actual” definition of
Monadin Haskell’s standard library hasbindreplaced by an operator>>=with the operands flipped in order to allow multiplebindcalled to be strung together cleanly, and theFunctorcontext replaced by anApplicativecontext (which itself is defined in aFunctorcontext). - ↵ If you’re unfamiliar, the
(\_ -> _)syntax defines an anonymous “lambda” function that takes some input argument following the\and returns the expression following the->. - ↵ It also means that
Stateis a bit of a misnomer. It isn’t really a state in itself; it’s actually more of a state processor. - ↵ This is kind of a weird thing to say in a category-theoretic context since it seems at first like we have a functor that maps objects to morphisms, which is not allowed. But the resolution is that Hask includes function types, which blurs the difference between the two in some ways (e.g.
a -> bis a function morphism, but can also be represented as a type/object in its own right).
Changelog
- Originally posted 2024.07.31
- 2024.08.02: Rephrasing in the last paragraph before the categorical monad section.
- 2024.08.04: Correction a couple dumb typos and add more descriptive signatures in the Either and State discussions.
- Updated in Typst reformat 2026.03.20 with adjustments to certain explanations and figures