Linear types for quantum information

home/blog/quantum-linear-types

Linear types for quantum information

2025.06.17

Recently I’ve been on a bit of a math/CS theory kick. The core of it has been category theory, but my fascination has come from two angles.

Initially, I can at it from a quantum physics perspective, where category theory is used to formally reconstruct the foundations of quantum mechanics: In categorical quantum mechanics, Hilbert spaces are treated as base objects and the linear maps between them become morphisms in the category Hilb. From there, additional structure is introduced in order to treat measurement and mixed states, and I think the result can be reasonably described as a formalism of quantum processes that’s focused on the informational content of what’s going on: In typical category-theoretic fashion, the details of the physical systems that can encode information are abstracted away and what’s left is mostly pure logic about how they relate to each other. One notable result to come out of categorical quantum mechanics is the ZX-calculus, which is a joy to use when working abstractly with qubits in quantum computing.

On the other hand, functional programming has offered another inroad to category theory, and nearly all functional languages hold abstraction as a central goal in their design. In the same way that categorical quantum mechanics abstracts away the spin system to deal with the qubit instead, functional programming abstracts away the underlying von Neumann architecture to focus on the informational content of a computation: In Haskell, a String is simply the ordered list of Chars it’s meant to represent–this is in stark contrast to something like a C-style string, which is the byte-by-byte array encoding of the characters that must be terminated by a null character. This can allow the programmer to write extremely elegant programs, and ideally work in a style that’s much closer to the underlying mathematics (though usually at some cost to performance).

The key point here is that we have two major uses of the same underlying principles¹ and usually such a thing is an indication that some sort of unification is possible. I found this cool image in a Math3ma post, originally created by Martin Kuppe², that depicts the various areas of mathematics as the provinces of a place called Mathematistan.

*The stick figures and speech bubbles (and the poor attempt at a boat) were added by me.*

Physics and computer science don’t really fit into this picture per se, but I imagine them as separate lands to the west and east, respectively: When they require use of pure mathematics, usually the physicists trek east over the Tundra of Topology to the Califate of Al-Gebra, or through the Woods of Algebraic Topology; and the computer scientists cross the Ocean of Logic to trade on the Coast of Category Theory or with the Grand Duchy of Game Theory. What I want to do in this post is note a particularly interesting meeting between physics and computer science (arguably more like the physicist in Al-Gebra travelling over to the computer scientist on the Ocean of Logic than the other way around), centered on creating a high-level programming language for quantum computers. In this post, we’ll be dealing with how to properly model quantum information in a high-level programming language, and what it really means from the perspective of a quantum computer scientist, for information to be called quantum.

Before getting into things, a couple disclaimers: First, I should say that there are already quite a few projects out there that claim to be a “quantum programming language,” but here I’ll argue that only a couple of them should really be considered as such (more on that later). Second, what follows is mostly cobbled together from several sources below, and I have done no original work in this area. I won’t claim to understand any of them fully, and as a result there may be a bit of awkwardness in how they all fit together.

R. Duncan, “Types for Quantum Computing.” Thesis, Merton College, Oxford, 2006. (link)
P. Selinger et al., “Quantum Lambda Calculus.” Semantic Techniques in Quantum Computation, Cambridge University Press, 2014. (link).
T. Altenkirch, “The Quantum IO Monad.” Semantic Techniques in Quantum Computation, Cambridge University Press, 2014. (link)
P. Wadler, “Propositions as Types.” Presented talk, Strange Loop Conference 2015. (link)
Without boats, dreams dry up, “Ownership.” (link)
philh, “A reckless introduction to Hindly-Milner type inference.” (link)

Here’s the #include list for the rest of this post:

Basic linear algebra: vector spaces, linear combinations, bases, span; outer products; matrix trace

Essential undergrad quantum: states as vectors, Dirac notation, superposition; complex probability amplitudes; conceptual understanding of measurement

Programming beyond simple Python: types as different kinds of data; basic familiarity with Haskell and Rust

Basic logic: implication, propositions, inference

Basic familiarity with monoidal category theory: monoidal product and unit; objects, morphisms, composition

Background: Propositions as types

We’ll start off with some background on what is probably the most important piece of mathematical theory when it comes to designing a programming language: types. Most people with some programming experience have come into contact with types before, probably in the form of a compiler error or maybe through isinstance in Python. On a practical level, types are like bins for different kinds of values: 5 falls into the int bin, "hello" falls into str, and so on. But as we’ll see, they can be so much more, working to determine the semantics of the programming language they’re implemented in.

Note: This whole section is pretty much paraphrasing Philip Wadler’s 2015 talk. If you’ve already seen it, or the term “Curry-Howard isomorphism” already has non-trivial meaning to you, then you can probably skip to the next section.

Natural deduction

As with a lot of things relevant to CS theory, the story begins in the early- to mid-20th century. In this case, we start with David Hilbert in 1928, who, in order to resolve a number of paradoxes (known as such by mathematicians in his time), posed the Entscheidungsproblem (lit. “decision problem”). The Entscheidungsproblem asks for an “effective method” (known today as an algorithm) that can take any logical statement as input, and return as output its evaluation as true or false.

So for example, logical statements like

can be evaluated straightforwardly–this one is obviously true and can be proven as such trivially by a computer that knows the logical rule

But consider this one:

which is quintessentially self-contradictory.

Here we come into contact with two important concepts in logic. Namely:

A logic is complete if every true statement is provable and every provable statement is true³ and
A decision problem is decidable if there exists an effective method for deriving the correct answer.

“This statement is false” presents a problem because it is undecidable and, as Kurt Gödel proved via the famous incompleteness theorems in 1931, still has a formal encoding. More precisely, Kurt Gödel proved that any system of logic powerful enough to encode its axioms and theorems in the natural numbers will always permit undecidable statements to be encoded as well. In other words, any logic can be complete or decidable, but not both.

One thing to come out of this examination of formal logic is Gerhard Gentzen’s 1933 system of natural deduction, which formalizes logical inference as a set of operations one can perform on symbols representing statements. Gerhard Gentzen laid out a set of basic inferences to define axioms for the system as a whole, and they look like this⁴:

In this (original) notation, all statements above the horizontal line are known (the premises), and the single statement below the line is what can be concluded from them. The symbol to the right of the line is simply a label for the rule. Those with subscript are “introduction” rules, where a symbol is introduced, and those with subscript are “elimination” rules, where the symbol is removed.

The first row defines logical implication (denoted by in the original notation, with the read as a backwards C for “Consequence”), whereby an assumption of (in brackets) leading to a proof of introduces the implication , and a proof of together with this implication eliminates it by allowing one to infer . The second row defines the logical connective (AND) which requires both its operands to be known in order to be introduced. Once is known, however, the can be eliminated by concluding either or .

Of course, premises can themselves be the conclusions of inferences, and these rules can be stacked on each other as trees of inferences that all come together to form a proof, such as this proof that :

Pairs of introduction and elimination rules can then be simplified (or “normalized”) by following these rules:

which results in a complicated, initial proof being rewritten in a simpler, but equivalent, normal form, like so:

The λ-calculus

In 1936, Alan Turing and Alonzo Church devised their distinct fundamental models of computation, the Turing machine and the λ-calculus. I’m not a computer scientist, but the general feeling I get from the outside is that more people are familiar with Turing machines than the λ-calculus, so I’ll assume familiarity with the former–only the latter is important to what I want to discuss afterward anyway.

The λ-calculus is a formal system for denoting computations. Instead of doing it as a series of instructions to a computer, the computation as a whole is expressed as a single mathematical expression, and the computation is performed by simplifying the expression down to a basic form using only two rewrite rules. Notably, the only base concepts included in the λ-calculus are functions and symbols; everything else can be derived. As a matter of notation, terms in the λ-calculus are described recursively in Backus-Naur notation using just three forms:

This notation is meant to show that any two terms and in the λ-calculus can be of the form of any one of the expressions on the right-hand side of the , separated by . These expressions stand for:

, a basic symbol or “atom” (it doesn’t have to be , just something that is not itself a general term);
, an anonymous λ-abstraction or “function” that takes an input argument denoted by a symbol and returns any general “body” term ;
, an application of a term to another term ;

and parentheses can be used to disambiguate where necessary. Having mentioned parentheses, wit’s worth noting that function definitions and applications are, in a sense, right- and left-associative, respectively, such that:

is equivalently written as

Using these forms, expressions in the λ-calculus can be built up to include any number of basic symbols, functions, and applications to denote a computation, for example:

Starting from the “outside” and working our way in, we have

a function of argument returning:
- a function of argument returning:
  - a function of argument returning:
    - the application applied to:
    - the application ;
  - applied to a function of argument returning:
    - the application
- applied to .

Then to actually perform a computation, one applies “rewrite rules” to convert an initial expression to a simpler but equivalent form. These rules are:

α-conversion, which renames variables by replacing every instance of a symbol (including in function arguments) with another,
β-reduction, which performs a function application by substituting an argument term for every instance of a function’s argument within its body, and replacing the application with the result,

Applying these rules to the example above results in the following reduction steps:

Application of these rules will eventually result in a basic “normal” form containing only atoms, atoms applied to atoms, or functions with no supplied arguments. This leads to a notion of equivalence between expressions, where two expressions are “β-equivalent” if they β-reduce to normal forms that are the same up to α-conversion. For example, and are β-equivalent.

Using the λ-calculus, cone can define things like Boolean logic:

or the natural numbers and addition:

The Curry-Howard isomorphism

The λ-calculus is pretty cool and one can use it to do logic, but one issue with this is that it allows for infinitely recursive expressions. The quintessential example of this is the “Y-combinator,” which takes a function and applies it infinitely:

As Alonzo Church proved when devising the λ-calculus, this means it’s subject to Gödel’s Incompleteness when used as a logic. But it turns out that additionally demanding that every term in an expression have a type is enough to resolve this, since assigning a type to an infinitely recursive expression is an undecidable problem, and would therefore be disallowed in what’s known as the “simply typed” λ-calculus.

To be more specific, the simply typed λ-calculus is the ordinary (“untyped”) λ-calculus, but where each term must have a type that can be inferred from the original expression. This introduces the need for “typing judgement” rules, which can be phrased like so:

Here, we’ve introduced the function type and the pair type as well as two functions and , whose actions can be typed with:

With the rules all laid out like this, it’s easy to see that they bear a resemblance to the rules of natural deduction above, with everything to the right of the “typing color” being a logical statement, and everything to the left being a corresponding λ-calculus representation.

This leads to the famous Curry-Howard isomorphism, whereby every program is a proof, every type is a proposition, every value of a type is evidence for the corresponding proposition, and program evaluation is proof normalization. The Curry-Howard isomorphism forms essentially the complete theoretical basis for the study of type theory and program semantics, theorem provers and proof assistants like Rocq (formerly Coq) and Lean, and strongly informs the design and development of many functional programming languages like OCaml, Haskell, Agda, and Idris.

But these ideas are useful outside of these languages and in practice as well. From a programming perspective, a type at its most basic level (in most languages) is simply a collection of data whose initialization can be limited to certain conditions. The idea is that, if one is careful, very specific relationships between types in a program can be encoded in the structure of those types such that they will provide guarantees against different kinds of errors. Violations of these guarantees will then result in invalid code that can be detected at compile time through a type checker–before any code is every actually run.

That was all kind of abstract, so let’s look at a couple examples. The first one is in Python. Suppose we have some collection of data that we’ll call List:

class List[T]:
    data: list[T]

    def __init__(self, *data: T):
        self.data = list(data)

    def first(self) -> Optional[T]:
        """
        Return the first element of the list, if it exists.
        """
        if len(self.data) == 0:
            return None
        else:
            return self.data[0]

Obviously List is backed by an ordinary Python list for completeness, but of course one could use something else. The key thing to focus on is the implementation of the first method which, in order to be a total function must account for the case where the collection is empty in some fashion; here it returns None. Even in strongly typed languages this behavior forces all code downstream of the call site to deal with the empty case. This is important to other functions like random.choice, which will throw an exception if an empty collection is passed.

Sometimes this behavior is fine, but other times we’d like to have a strong guard against this kind of error so that our programs don’t have to check for None values everywhere. Consider this simple change to the internal structure:

class NonEmptyList[T]:
    head: T
    tail: list[T]

    def __init__(self, head: T, *tail: T):
        self.head = head
        self.tail = list(tail)

    # guaranteed not to return `None` (unless T is NoneType)
    def first(self) -> T:
        """
        Return the first element of the list.
        """
        return self.head

In NonEmptyList, the head element is stored in a field separate from all the other tail elements, and __init__ now has a required positional argument before the rest of the elements. From a logical perspective, this corresponds to the requirement that the caller prove the existence of at least one element of the collection before a NonEmptyList can be created. The upshot is that after it is satisfied, NonEmptyList will uphold this condition for as long as it exists⁵, and hence any code calling its first method need not consider the case where the list is empty. For this reason, we can say that NonEmptyList really has the semantics–the mathematical meaning–of a list that is indeed non-empty.

Here’s another example, implemented in Rust. In this example, we can use types to construct a list of items that is growable and shrinkable to arbitrary sizes, but whose length is known and verifiable at compile time. For this, we’ll start off with a type-level encoding of the natural numbers:

struct Zero;
struct Succ<A>(PhantomData<A>);
type One = Succ<Zero>;
type Two = Succ<One>;
type Three = Succ<Two>;
// ...

// translate a type value into a regular number
trait Nat { const VALUE: usize }

impl Nat for Zero { const VALUE: usize = 0; }
impl<N: Nat> Nat for Succ<N> { const VALUE: usize = N::VALUE + 1; }

These definitions follow the shape of how the natural numbers are defined in the λ-calculus, where we have a base zero element, Zero, and a successor function, Succ, that takes any natural number to the next smallest. More specifically, Succ is a parametric type that we’re using like a function on type values to recursively define all numbers greater than zero. One is Succ<Zero>, Two is Succ<One>, and so on. Then we define a trait called Nat, which functions mostly just to assign a concrete usize value to each natural number type we’ve defined in this way. Importantly, though, these values are all const, so they’re all still computed at compile time. Now we can define another (parametric) struct StaticSizeList that uses these types to encode its length in its type.

// don't worry about the `PhantomData` -- this is included to satisfy Rust's
// compiler, but isn't strictly required in other languages
struct StaticSizeList<T, N>(Vec<T>, PhantomData<N>);

impl<T> StaticSizeList<T, Zero> {
    fn empty() -> Self { Self(Vec::new(), PhantomData) }
}

impl<T, N: Nat> StaticSizeList<T, N> {
    // the length of the list can be exposed as a compile-time constant
    const LEN: usize = N::VALUE;
}

impl<T, N> StaticSizeList<T, N> {
    // changes in length are tracked at compile time
    fn push(self, x: T) -> StaticSizeList<T, Succ<N>> {
        let Self(mut items, _) = self;
        items.push(x);
        StaticSizeList(items, PhantomData)
    }
}

impl<T, N> StaticSizeList<T, Succ<N>> {
    // pop is only implemented for non-zero lengths
    fn pop(self) -> (T, StaticSizeList<T, N>) {
        let Self(mut items, _) = self;
        let last = items.pop().unwrap();
        (last, StaticSizeList(items, PhantomData))
    }
}

StaticSizeList takes two type arguments, T and N. T is the element type, and N is the length of the list. We first define a constructor method empty to create a new StaticSizeList with length Zero. From there, we can implement two methods push and pop that can add and remove elements. Most importantly, though, these methods modify the value of the N argument in the type by adding or removing a Succ at each call. Since the only way to create a new StaticSizeList is through empty (for which the size is always Zero), the relationship between the value of the N type argument and the actual length of the inner Vec is strictly maintained.

Because of this, StaticSizeList upholds the necessary semantics: The type as a whole provides a strict guarantee that a StaticSizeList<T, N holds exactly N elements, and yet can still have new items added or existing items removed while maintaining this property. Note also that tracking the length in this way allows pop to avoid having to return an Option<T> (accounting for the case where self is empty) because pop is only implemented for length Succ<N>–that is, only for non-zero lengths–if one tried to call pop on a StaticSizeList<T, Zero>, the compiler would complain because the implementation wouldn’t exist⁶!

Through the magic of the Curry-Howard isomorphism, the structures of our types and the relationships between them can be used to ensure that the elements of our programs conform to a strict system of logic. By carefully designing the types, we can choose the logical rules, and use a program as a mathematical proof of some property. Later in this post, I’ll show how types can be used for the purposes of manipulating quantum information.

Quantum programming languages

So now we can finally start adding the quantum bits to our picture of programming. As usual, we’ll start with a comparison to the classical case (i.e. essentially every existing language). Any (high-level) language, by definition, abstracts away the details of the machine running its programs. Depending on the language, this can mean, e.g., automatic memory management, first-class functions, or a sophisticated type system; but the central theme is that high-level abstractions allow users to (as in the intro) focus on the informational content of their programs. In other terms, we don’t care about bits and bytes, only the data they encode. So the question to answer now is: What abstractions should be made to allow one to work properly with quantum information when programming a quantum computer?

Unsurprisingly, the answer to this question lies in the differences between quantum and classical information. There are many ways to demonstrate these differences in the abstract, but probably the best one here is by considering how pieces of information behave when we stick them together.

The humble pair

The simplest non-trivial piece of (classical) information is the bit, which for our purposes is well-modeled by the type . is inhabited by two values, and , and we want to think about how to compose s together to form larger informational structures. The simplest way to do this for two s is by pairing them up to form a value of type , generally known as a (2-)tuple or “pair,” which is then inhabited by four values,

If we wanted to go bigger, we could then par this up with another to form values of type , which for our purposes will be known as a 3-pair and is inhabited by eight values,

And of course we could keep going up to a general -pair to model larger types.

Notice that for each we tack on through this pairing, we double the number of values for the type: The -pair of s is inhabited by values. This also generalized to pairs formed by other types: Given two types and , inhabited by and values respectively, the pair is inhabited by values–for this reason, the type is known (usually in a category-theoretic context) as the “product” of and .

Morally, the pair acts as a way to concatenate two pieces of information, simply “sticking” them together. Using pairs, we can easily model how information can be copied:

-- yields a value of any type paired with itself
copy :: a -> (a, a)
copy x = (x, x)

as well as discarded:

-- takes only the first element of a pair, discarding the second
fst :: (a, b) -> a
fst (x, _) = x

-- takes only the second element of a pair, discarding the first
snd :: (a, b) -> b
snd (_, y) = y

Category-theoretically, it’s easy to see how pairs can be regarded as a monoidal product in the monoidal category of a programming language’s type system, ⁷. The monoidal unit is then the single type (up to isomorphism) that does not change the informational content of an original value of any other type when paired with it, in either the left or right positions. In other words, the monoidal unit is the type for which and are essentially the same as . More technically, we can write:

type Unit = -- some appropriate type...

-- combines two functions to make a third that takes a value from the pair of
-- domains to the pair of their images
fpair :: (a -> b) -> (c -> d) -> ((a, c) -> (b, d))
fpair f g = \(x, y) -> (f x, g y)

-- reorders a 3-pair
assoc :: (a, (b, c)) -> ((a, b), c)
assoc (x, (y, z)) = ((x, y), z)

-- left unitor: must be an isomorphism with `unitorL'` as its inverse
unitorL :: (Unit, a) -> a
uniforL = -- some implementation...

-- inverse of `unitorL`
unitorL' :: a -> (Unit, a)
unitorL' = -- some implementation

-- right unitor: must be an isomorphism with `unitorR'` as its inverse
unitorR :: (a, Unit) -> a
uniforR = -- some implementation...

-- inverse of `unitorR`
unitorR' :: a -> (a, Unit)
unitorR' = -- some implementation

If, for some choice of type Unit, the following are satisfied, then the pair as described above is a monoidal product with Unit as its unit:

unitorL . unitorL' == id == unitorL' . unitorL
unitorR . unitorR' == id == unitorR' . unitorR
fpair id unitorL == (fpair unitorR id) . assoc

Under these laws, the choice for Unit is uniquely determined up to isomorphism as the singleton type–the type that has only one value–which is called () in Haskell and Rust, and unit in OCaml.

Another interesting aspect of this monoidal unit is its relation to the creation and deletion of values: Because () only has one value, there is exactly one unique morphism with the type a -> () for any type a, corresponding only to processes that have no dependence on their inputs, instead simply deleting the input and replacing it with the singleton value. By the same token, the number of unique morphisms of type () -> a is exactly equal to the number of values inhabiting a, corresponding to the process of creating a constant value of type a. Although this doesn’t matter so much in globally lazy languages like Haskell, this idea is used to globally eager languages to achieve some measure of lazy evaluation. For example, OCaml achieves lazily evaluated iterators by defining an iterator (spelled Seq in this language) like so:

(* 'a is how OCaml spells type variables *)

(* ['a Seq.t] is simply a synonym for the function type [unit -> 'a Seq.node] *)
type 'a Seq.t = unit -> 'a Seq.node

(* Thus an iterator is achieved by wrapping these functions in a linked list,
   where the iterator is advanced lazily by providing the singleton value [()]
   at each step. *)
type 'a Seq.node =
  | Nil
  | Cons of 'a * 'a Seq.t

(* Example: take only the prefix of elements that satisfy a predicate. Only the
   elements up to the first non-satisfying element are actually evaluated. *)
let rec take_while (pred: 'a -> bool) (seq: 'a Seq.t): 'a list =
  let next = seq () in (* evaluate the next element in [seq] *)
  match next with
  | Cons (head, rest) -> head :: (take_while pred rest)
  | Nil               -> []

Quantum product spaces

Now let’s talk about quantum information. The simplest non-trivial quantum system and analogue of is the qubit which, when measured, can be in one of two states, or . We can call the “type” of these states , but the key difference between it and is that and are not the only values that inhabit . The general state of the qubit is modeled as a superposition of and : Whereas is simply a two-element set, is a complex-valued vector space⁸ whose elements are the linear combinations with and being arbitrary complex numbers up to the normalization condition ⁹. This comes from the fact that and are the probabilities that one will measure the state as or , whereby the qubit collapses to one or the other, and points at an important subtlety in the quantum-classical information comparison.

In the classical world, we think about data as always having definite state with transformations on our data preserving this property. For example, a 32-bit register encodes information by selecting a particular integer out of the 4 billion other possible ones, and performing an operation on it will simply select another. But in the quantum world, the space of available states is expanded to be inherently probabilistic, meaning that quantum states are not simply values but distributions over values, and this is what makes quantum systems to hard to simulate: When we want to transform a quantum state, we have to transform the entire distribution! This has profound implications for the structure of products–when we stick two probability distributions together and then perform a transformation, the result will in general not be separable like it would be if we were working with definite values.

For the purpose of illustration, let’s first assume that we can add monoidal structure to this hypothetical category of quantum types, and then see what logically follows. For any value of type , we know that the states and must be valid values, so we should expect that the value-products

must be members of the type as well¹⁰. Appealing to physical intuition, these four states should describe all possible outcomes of measurements one could perform on this system, so they should be enough to construct a basis for the type , which has dimension 4. This means the general form for a value of this type is

with .

There are a few things we could say about at this point, but let’s quickly try pushing up to the product of three systems first. Assuming a proper monoidal construction, we can use the fact that is associative to avoid having to care about any distinctions between and , and simply write , just to save me some typing. Adding in the new qubit should introduce a new digit into our basis states,

which gives as the dimension of the space and a general form for a three-qubit state as

with .

Here’s where things get interesting. To cut a long(er) story short, these relationships identify the monoidal product with the linear-algebraic tensor product (which not-so-coincidentally already uses the same symbol). acts on pairs of state vectors like this:

and as a categorical “product” by resulting in a vector space whose dimension is equal to the product of the dimensions of the two component vector spaces, i.e. . Notice that, for product of s, we can always peek into the structure of the pair and easily extract individual bits and pieces (see the definitions of fst and snd above). This is not the case with tensor product: While it is always possible to multiply two states through the tensor product, it is not, in general, possible to “divide out” states–for example, the two-qubit state

cannot be written as a tensor product of two one-qubit states. (The special states that can be factored as a tensor product are called “product states.”) This is one manifestation of quantum entanglement as an informational primitive: It suggests that is somehow irreducible on a fundamental, informational level–meaning there must be some amount of information stored in the entanglement itself–despite the fact that the physical system really comprises two ostensibly smaller parts. Through entanglement, the composition of two quantum systems is able to hold more information than an analogous composition of two classical systems.

To complete the monoidal structure, we now have to find a monoidal unit. The natural choice is the one-dimensional complex-valued vector space , otherwise known as the set of complex numbers. Although tensor products cannot generally be factored, the tensor product of any vector space with can (since in this case the product simplifies to ordinary scalar multiplication) which allows the unitorL and unitorR natural isomorphisms from above to be defined in this context. But this leads to an interesting situation: unlike the singleton type, has non-trivial structure! To oversimplify a bit, this is allowed because all of the math surrounding how physical observables can be elucidated from these states is insensitive to overall phase factors. That is, the state will give the same physical predictions as . If we then relax the overall normalization condition by moving it into the definition of the state probabilities, i.e.

then we can extend the insensitivity to any overall complex factor and make everything sound.

Measurement and (no-)cloning

As with the classical unit , plays an important role in modeling the preparation and destruction of quantum states. Morphisms of type corresponding to constant procedures to product states in the Hilbert space (up to an overall scalar). In the reverse direction ‘s structure records what happens when we want to apply measurement operations and consume states via morphisms of type . These morphisms product probability amplitudes that hold information about the particular quantum process(es) the state went though.

Whereas the morphisms composition –which creates a value and then deletes it–doesn’t really depend on what and are, the analogous –which prepares the qubit state and then measures the state overlap –certainly does depend on and . So although has the overall type , this degree of freedom is still expressible as the fact that such a morphism can correspond to any complex scalar (cf. a morphism of type , which can only be the identity).

And this only gets messier when we want to think about partial destruction of quantum information–that is, construct a quantum analogue of fst and snd. To properly explain what happens when such a thing is attempted, we need to introduce a different representation of quantum states, namely the density matrix. Density matrices (usually denoted as or similar) are formed from the complex outer product of state vectors. For any state , the corresponding density matrix is

In the case of a qubit state , this results in

where you can think of the “ket-bra” as a basis element in the space of complex-valued matrices (in the same way that is a basis element in the space of complex-valued vectors). If we wanted, we could write this out as an actual matrix,

with the specification that the element gives the coefficient on the basis element.

Some properties of density matrices:

A properly formed density matrix is Hermitian: .
The main diagonal of a density matrix gives state measurement probabilities, and the trace of a normalized density matrix is .
The off-diagonal elements of a density matrix come from cross-terms in the outer product, and can be thought of as a measure of the “coherences” in a quantum state (more on that below).
The expectation value of an operator on can be calculated as .
The density-matrix formulation of the Schrödinger equation is the von Neumann equation,

Density matrices are an important generalization of the usual mathematical representation of quantum states because they allow one to model the interplay between classical and quantum information. The general form of a density matrix is as follows:

where is any ordinary quantum state, and is a probability associated with that state. This definition can be seen as a classical probability distribution over quantum states, meaning that, in general, the elements of a density matrix will encode both quantum and classical information, leading to the concept of a “mixed” state. Simply put, a mixed state is any quantum state whose density matrix cannot be written as a single outer product (cf. an entangled state cannot be written as a single tensor product), with everything that can being called a “pure” state. More concretely, a convenient property of density matrices is that if a state is pure, its density matrix with be such that (assuming normalization), and any state with any amount of mixing will have . I briefly mentioned above that the off-diagonal elements of a density matrix measure the “coherence” of a state, which can be thought of (at least conceptually) as the inverse to the degree of mixing present in the state: A pure state is perfectly coherent, a mixed state has some degree of incoherence, and a “totally mixed” state has all off-diagonal elements equal to zero.

The bottom line is that partial destruction of quantum information can introduce missing into the remaining quantum state¹¹, meaning that our hypothetical fst and snd constructions need to be more complicated than their classical counterparts. The mathematical model for this kind of process revolves around something called the “partial trace” over the subsystem encoding the information being destroyed: Given a state in the product space with density matrix

(with ranging over the quantum numbers of the space, and ranging over those of the space), the partial trace over the subsystem is

which results in some state left in the subsystem. (The partial trace over , , is defined similarly.)

Taking the state from above as an element of the space , the corresponding density matrix is

and one can verify that . We can try computing the partial trace over , which results in

Note that we’re left with a state in and that, more importantly, it’s entirely mixed despite the face that we started from a completely pure state. In order words, the act of discarding the qubit removes the (quantum) information stored in the original state’s entanglement and results in the remainder containing only classical information–the removal of a part of a system (or, physically, measuring that part of the system and throwing away the result) has a non-trivial back-action on the remainder, and the above are the special considerations that need to be made for functions of type and ¹².

But with this, we have our constructions of quantum fst and snd:

Except I’m kind of lying here. Technically, we have to allow for the existence of mixed states as described above, which requires us to (a) change the definitions of and :

(where and are the conjugate spaces of and ), or (b) change the kind of vector space we’re working in. Typically (b) is chosen, and we switch from ordinary Hilbert spaces to other things called C*-algebras, but at the end of the day, the story is morally the same.

One final thing to note before moving on is the idea behind what’s known as the no-cloning theorem. Essentially, the no-cloning theorem states that it’s impossible to construct an operation that creates a copy of an arbitrary quantum state on an independent system without entangling the two systems, destroying at least some of the information stored in the original state, or both. This means we likewise have to make special considerations for functions of the type , but we’ll get to those in a bit. For now, we can simply note that writing a general function like copy above for our hypothetical quantum computer is literally impossible.

Linear logic and substructural types

So now we have a working understanding of the two most important differences between how classical and quantum information behave. In general, both the addition and removal of a constituent part of a larger system will have non-trivial effects on everything else, which means the type system of a hypothetical quantum language should make its users aware of such things. These are the main semantics we want to have, and the remaining work lies in formulating a type system to properly model this.

Constructive logic

Per Curry-Howard, the type system of a language reflects a core logical structure, so let’s start there. Here we’ll deal with expressions of this form:

where all stand for logical propositions and the symbol, often called the turnstile, stands for a judgement of entailment–essentially it’s like a generalized, “flattened” version of the horizontal inference bars from above. Specifically, this expression denotes the entailment “if are all true, then at least one is true.” In other terms, it stands for

Every term to the left of the is called an antecedent, everything on the right a succedent or consequent, and all terms are generally called cedents or sequents.

In what follows, it’ll help to adopt a constructivist view where terms stand for not only propositions, but (again, per Curry-Howard) a value or resource that provides evidence for the proposition as well. These resources could be the usual pieces of data like integers or strings, but could also be more complicated things in programming like a handle to a file on disk, a connection to another computer over the network, or–as we’ll be discussing below–a quantum system. In the former case the logic and corresponding types tend to be simple to work with because they reflect the simplicity of the underlying information, but the latter generally demands extra constraints on the types and their relationships. To make things a bit more concrete, one can usually think of an entailment with one succedent as having a corresponding implementation as a function in a program: The antecedents are the arguments, the succedent is the return value, and the relationships between them will influence which functions can be implemented in the language.

Multiplicative additive linear logic

These extra constraints modify the relevant logic in the semantics of the eventual programming language and create what are called “substructural” logics, with their corresponding type systems being called substructural types. In principle, the added substructure can modify what we usually think of as the ordinary logic of everyday life, but generally the discussion (as it pertains to computer science and programming language design) revolves around two important properties of ordinary logic.

The first of these is called “weakening”: A logical entailment that can be weakened is one that allows arbitrary additional terms to the added to the antecedents of the entailment without invalidating it. Generically, it looks like this:

A given entailment can be weakened (as assume) to a weaker entailment ¹³. All entailments in ordinary logic have this property, so for example we can take a base entailment like “if there’s smoke then there’s fire” and generate a weaker form of the statement, “if I’m happy and there’s smoke, then there’s fire.”

The second is called “contraction”: A logical entailment that can be contracted is one for which an antecedent can be removed if the entailment contains multiple of that antecedent:

Ordinary entailments also have this property, and we can demonstrate with a statement like “no pain, no gain” (or, keeping with the constructivist theme, its logical inverse),

Here we’ve taken a version of the statement with duplicated pre-conditions, i.e. two copies of , and eliminated one as a redundancy to arrive at .

These are the two properties that people have been working with recently to formulate alternative logics. Here, the substructure in these logics comes from whether these properties are present (or to what extent in the space of all possible entailments). These lead, of course, to alternative kinds of types that can exist in programming languages where weakening and contraction impose restrictions on when values can be added or removed from a program. Usually these aspects are phrased in terms of how many times a given value can be “used,” by either discarding it or passing to another part of a program–we’ll make this more concrete in a bit. For now, take a look at this nice table describing the features of alternative logics (and types) where weakening and/or contraction are relaxed:

Type	Weakening	Contraction	# of uses
Normal	Yes	Yes	Any number
Affine	Yes	No	At most once
Relevant	No	Yes	At least once
Linear	No	No	Exactly once

Take as an example the following Rust snippet. In Rust, types are affine by default, meaning that, unless certain extra conditions are satisfied, values can be used at most one time before the compiler complains.

// Hoare's vending machine: `coin` can only be spent once!

struct Coin;
struct Candy;
struct Drink;

fn make_coin() -> Coin { /* ... */ }

fn buy_candy(coin: Coin) -> Candy { /* ... */ }

fn buy_drink(coin: Coin) -> Drink { /* ... */ }

fn main() {
    // create a new value of type `Coin`
    let coin = make_coin();

    // `coin` is "moved" into `buy_candy`:
    // it doesn't exist in this scope anymore
    let candy = buy_candy(coin);

    // compiler error:
    // `coin` doesn't exist anymore, so we can't spend it again!
    let drink = buy_drink(coin);
}

In this example (known as the late logician Tony Hoare’s “vending machine”) we have a unit struct called Coin, which is an affine type, and two functions, buy_candy and buy_drink, that can take Coins as argument and return another thing appropriately. In main, we construct a value coin of this type, and attempt to pass the same value to one call of each function. Each function call counts as a “use” of coin, and as a result the compiler will refuse to compile this program since the affine-ness (or “affinity”) of Coin means that coin can’t be used again after being used by buy_candy.

If we were to write this in Python (where types are not affine by default), there would be no such error because of a lack of affinity:

class Coin:
    def __init__(self):
        # ...

def buy_candy(coin: Coin) -> Candy:
    # ...

def buy_drink(coin: Coin) -> Drink:
    # ...

def main():
    coin = Coin()
    candy = buy_candy(coin) # no (restrictive) concept of a value being "used"
    drink = buy_drink(coin) # OK

Getting back to quantum information, we want to consider linear types–that is, types for which the compiler will complain not only if we try to use a (quantum) value more than once, but also if we try to write a program that fails to use that value at all. This is due to (a) the back-action of measurement limiting weakening and (b) the no-cloning theorem limiting contraction–not to say that a value belonging to a quantum type can never be contracted or weakened in our hypothetical quantum language, only that these processes have non-trivial effect that must be tracked explicitly through e.g. special built-in functions of types and , respectively.

Now let’s get a little more precise. Linear types follow linear logic, so we should be able to write down symbolic logical rules with connectives to formalize the relationships between the corresponding types. The best way to explain these is probably to contrast the proper rules with those of ordinary logic. In ordinary logic, we have two values and , with two basic binary operations (logical function AND/ and disjunction OR/) and one basic unary operation (negation NOT/).

Operation	Meaning	Unit
	Conjunction (AND)
	Disjunction (OR)
	Negation (NOT)	–

From these primitives, we can define implication (the equivalent of a function type) and identify the idempotency of negation,

In linear logic, we split values and operations into “multiplicative” and “additive” forms, with additional values and operations to match.

Operation	Meaning	Unit
	Multiplicative conjunction (times)
	Multiplicative disjunction (par)
	Additive disjunction (plus)
	Additive conjunction (with)
	Dual	–
	Antecedent weakening + contraction	–
	Succedent weakening + contraction	–

Now, we have to deal with four base values (two multiplicative and two additive; is called “top” and is called “bottom”), as well as two versions of both conjunction (“times” and “with”) and disjunction (“par” and “plus”). It may be tempting to understand these by analogy to the ordinary logical operatives, but I’d advise against it–it is precisely the fact that these values and operatives construct an alternative system of logic that prevents the analogy from being properly drawn. Instead, the elements of the table above are generally defined by the following relations.

For the multiplicative elements:

For the additive elements:

The two remaining elements are called “exponentials” and refer to unary operations that convert types into versions of themselves that can be weakened or contracted¹⁴.

As in ordinary logic, one can then define “linear implication” (denoted by the symbol , which is sometimes called the “lollipop” operator) and define the idempotency of the linear dual as well as its action on exponentials.

At a high level, we can think of linear implication as a linear function: This is the type of a function that process linear types in a way that upholds its operands’ linearity. Simply put, though, these primitives encode relationships between types that allow appropriate type systems to keep track of when values are created and used, and, most importantly, detect when those relationships are violated in order to disallow invalid programs.

True quantum types

So now we have all we need to actually make a quantum programming language. From a typing perspective, a surprisingly large amount revolves around merely writing down the soundness rules for a language tha contains linear types. As a minimal example, we can look at the following construction of a “quantum λ-calculus” by Peter Selinger and Benoît Valiron.

Quantum λ-calculus

Let’s start off with the terms in the calculus:

We’ll break this down line-by-line:

The first line after the is the usual λ-calculus, with the addition of “constants” (which have known types, but are not defined with an expression–these are rather like “built-in” elements) and a “unit” –the member of the singleton type below.
The second line handles construction and destructuring pairs (i.e. binary products), denoted with angle brackets. The first on the line creates a pair with on the left and on the right, and the second binds the elements of a pair denoted by to the local variables and , only within the scope of .
The first line handles binary sum types–in particular, this line works with the analogue of Haskell’s Either in our hypothetical language. The first two terms on the line “inject” a term into the or halves of the sum (Left and Right in Haskell), and the last destructures a sum using a “match” whose arms handle the and cases to produce expressions of the same output type.
Finally, the fourth line allows for recursive functions to be defined using ML’s letrec notation.

Now for the types:

On the first line, we define the primitive types (a single qubit) and (the type of the unit ).
The second line defines the linear type operators , , and : is a type that can be weakened and contracted, is the type of a pair of values of type and , and is the type of a sum of the same.
Finally, the third line defines the linear function type . (Notice that we don’t define the ordinary function type , which can be seen as .)

With the basic components defined, we can do a bit of fun type math: Using only the primitives defined above, we can define classical bits (which are notably missing also s are defined) as

and we can even make sequences using the usual recursive/linked-list definition:

And, as another feature of our hypothetical type system, typing rules (which I won’t reproduce here–see Selinger and Valiron) can be imposed to forbid . That is, the language itself will prevent you from constructing a function that can clone quantum information or somehow treat quantum information directly as classical information.

As a final demo, we can look at an implementation of the Deutsch-Josza algorithm here it is.

Morally, we can see that computes the tensor product of two operators, turning independent single-qubit transformations into a single transformation on the product space of the two qubits; converts classical information into quantum information (in other words, initializing a quantum bit in either computation state); measures a qubit, turning quantum information into classical information through wavefunction collapse; and the argument is an operator on a two-qubit space.

All these produce a type judgement (through Selinger and Valiron’s typing system) for as a whole, as:

In other words, is a higher-order function that takes a two-qubit operator and produces a single classical bit as a rules. Note that even though we’re working with quantum information, the routine as a whole can be weakened/contracted (denoted by the outermost ), since all linear relationships are upheld.

Wrapping up

In this post, we took a look at the Curry-Howard isomorphism and what it means for how types can be manipulated to encode strict, logical relationships between runtime values. When these relationships are violated by programs written by humans, the Curry-Howard isomorphism allows the compiler to detect these violations and completely disallow their compilation–before any code is actually run. Then we explored quantum information from a roughly type-based perspective. We examined physical systems of qubits using the language of product types and monoidal categories, and arrived at a more mathematical description of how quantum information differs from classical information. Finally, I gave the quickest and loosest description of linear logic, how it applies to quantum information, and described Selinger and Valiron’s construction of a linearly typed quantum λ-calculus.

It’s important to note that there already exist linearly typed quantum programming languages like Quipper and QML. These are very much research languages and not used in any kind of production, however. But this contacts the ever-important question of–should quantum computing prove generally useful enough to endure as a “real” mode of computation with widespread industry adoption–how best to allow people who are otherwise unfamiliar with quantum mechanics and information to actually work with “real” quantum computers. Short of widespread additions of quantum theory to every computer science curriculum, it seems to me that specialized programming languages with strong (linear) type systems are probably the best way. If such a language that encode the relevant physical lows in its types were to be created and adopted, I think it would make an incredibly effective environment for both learning about and working with quantum information.

Footnotes

↵ Coming from a non-theoretical background, I have to say that this has really helped with learning category theory: I’ve found that, with the more abstract forms of math, it’s easy to find yourself trapped in a particular instantiation of the mathematical theory. I’ve tried a few times to learn group theory in a similar fashion, but all of the resources I’ve used make it too easy to think about everything using matrices, for example, which then makes it harder to understand the more abstract ideas, at least for me. (I’ve mostly only used things with titles like “Group Theory for Physicists” though, so maybe it’s more of a physics problem.)
↵ Unfortunately I can’t find a working link to anything that looks to me like an original source for this image, but it seems to be fairly widespread and widely acknowledged that Martin Kuppe is indeed the originator.
↵ Note that falsehoods can be dealt with by proving the logical negation of the formal statement.
↵ To be clear, there are a few more rules, but only these are particularly relevant to what I want to say.
↵ As long as no one does any tempering with its internal structure–this is Python, after all…

↵ The set of available operations above can be extended, but requires slightly more complicated traits. For instance, append and split methods can be implemented using an Add trait that performs addition between two length types, like so:

trait Add { type Sum; }

impl Add for (Zero, Zero) { type Sum = Zero; }

impl<N> Add for (Zero, Succ<N>) { type Sum = Succ<N>; }

impl<N> Add for (Succ<N>, Zero) { type Sum = Succ<N>; }

impl<N, M> Add for (Succ<N>, Succ<M>)
where (N, Succ<Succ<M>>): Add
{
    type Sum = <(N, Succ<Succ<M>>) as Add>::Sum;
}

impl<T, N> StaticSizeList<T, N> {
    fn append<M>(self, other: StaticSizeList<T, M>)
        -> StaticSizeList<T, <(N, M) as Add>::Sum>
    where (N, M): Add
    {
        let StaticSizeList(mut self_items, _) = self;
        let StaticSizeList(mut other_items, _) = other;
        self_items.append(&mut other_items);
        StaticSizeList(self_items, PhantomData)
    }

    fn split<A, B>(self)
        -> (StaticSizeList<T, A>, StaticSizeList<T, B>)
    where
        (A, B): Add<Sum = N>,
        A: Nat,
    {
        let Self(mut self_items, _) = self;
        let other_items = self_items.split_off(A::VALUE);
        let left = StaticSizeList(self_items, PhantomData);
        let right = StaticSizeList(other_items, PhantomData);
        (left, right)
    }
}

↵ There are actually a few extra conditions associated with declaring something like this, but we’ll leave them alone for now.
↵ Technically, it’s a Hilbert space since the physics demands the existence of an inner product between states, but we’ll leave that alone here.
↵ This also means the choice of and is, in a sense, arbitrary because we could have picked e.g. and to span this space–a feature of the system used in many quantum algorithms–but that’s not so important here.
↵ It’s worth noting a notational convenience here. Properly, the monoidal product is only defined for categorical objects (types/Hilbert spaces) and morphisms (functions/linear maps), and so can’t really be used on individual elements like this.
↵ Generally, the presence of some non-zero degree of mixing in a state indicates that some amount of information–usually stored in some entanglement with another system–has been somehow thrown away.
↵ For a more computationally grounded look at the partial trace, see Working with product-space states.
↵ “Weaker” in the sense that the new entailment has more pre-conditions that one must satisfy in order to be able to conclude (or produce) the antecedent(s).
↵ Note that exponentials cannot generally be applied arbitrarily; a proper type system will define additional conditions that limit when and can be applied to a specific type term.

Changelog

Originally posted 2025.06.17
2025.06.16: fix broken link to the mathematistan image
2025.06.28: typo in NonEmptyList example: first is guaranteed not to return None
Updated in Typst reformat 2026.03.22 with minor wording adjustments