A swift introduction to matrix product states

home/blog/mps

A swift introduction to matrix product states

2024.07.31

The previous post discussed some ideas regarding Clifford circuits and the theoretical background for why this specific subset of states and circuits can be simulated with sub-exponential requirements. But of course, this imposes a rather strict limitation to the kinds of things that can be simulated. Clifford circuits with mid-circuit measurements will allow you to see a phase transition in entanglement entropy, but it’s generally believed that Clifford circuits (probably) aren’t even universal for classical computation. If we exit this regime, though, we have to break the assumptions that allow the Gottesman-Knill theorem to work. This in turn means that we really just have to accept exponential scaling requirements in both time and space in order to model states and gates generally. But rather than resorting to naive vectors and matrices, there are a couple clever things we can still do to make the simulation a bit nicer.

Entanglement entropy and the Schmidt decomposition

The naive target quantity we want to calculate from the simulated states (at least with respect to measurement-induced phase transitions) is something called the entanglement entropy. Like most other things that are called entropies, the entanglement entropy is broadly a way to quantify the information contained in some system. Here, it’s specifically the information that’s contained within the entanglement between two parts of a given system of quantum objects.

For some density matrix of a system under a given bipartition with , and reduced density matrices and for each partition, we have:

The Von Neumann entropy,
The generalized Rényi entropy,

with , which equals the Von Neumann entropy for .

And it’s worth noting that usually we only calculate these for pure states, since the entropy of a density matrix with some amount of mixing will include contributions from the associated classical distributions.

The naive calculation of these entropies incurs a large runtime cost: Given a pure state of qubits, we have to calculate the density matrix, trace out a subsystem, either diagonalize to compute the matrix logarithm or compute a (possibly non-integer) matrix power, and then trace over the resulting matrix. This is obviously non-ideal.

For some state in the product space of two Hilbert spaces , the Schmidt decomposition of is the combination of two orthonormal bases and together with a set of real, non-negative coefficients such that

It can then be shown that the corresponding reduced density matrices for either subsystem, when written in the corresponding basis, are diagonal with eigenvalues . Thus, the Schmidt decomposition reduces the calculation of the above entropies to the forms

with the caveats that, of course, one still eventually has to perform the Schmidt decomposition.

Moreover, the Schmidt decomposition of a state is related to the singular value decomposition (SVD) of a matrix in the following way. Given a matrix of shape , the SVD of is the matrices , , and such that . Specifically, and are orthonormal matrices with shapes and where . can be seen as either a diagonal matrix of shape where the diagonal entries are the “singular values” of , or as a simple vector of length containing the same (in which case we instead write in Einstein summation notation).

Here it’ll be useful to introduce a bit of notation, since we’ll be dealing with a lot of indices below. Instead of writing tensor indices as subscripts, I’ll move them up to elements within square brackets so, for example, the above SVD will then be written as .

A state for particles with the -th particle being characterized by some quantum number ranging over values can be represented as a rank- tensor¹ . In the same way that can be “flattened” into a one-dimensional vector (where denotes the fusion of )², can also be “partially” flattened (or “reshaped”) into any tensor of rank . For any given bipartition of the system that divides particles from particles , we can have a rank-2 tensor (a matrix) . The SVD of this matrix exactly corresponds to the Schmidt decomposition of for the same bipartition, where the columns of are , the columns of (note the ) are , and the (diagonal) elements of are .

Schmidt decomposition as factorization

Matrix product states (MPSs) offer several computational benefits, all of which stem from the central fact that a MPS is a way to factor states (and operators, although we won’t do so here) into local components that can be dealt with individually. A generic MPS is usually written minimally as

where is a matrix or vector whose entries are quantum states, and it’s understood that when two of these matrices are multiplied together, their entries are multiplied using the tensor product , which usually looks like this:

The final trace operation is then an ordinary sum over states in the diagonal of the resulting matrix, and can be dropped if and are both formally vectors.

This is rather unintuitive and disguises the use of as containers of data, though, and usually it’s better to think of MPSs as a collection of rank-3 tensors (possibly rank-2 on the ends),

where the entries of each tensor are ordinary complex numbers. As a tensor network³, this looks like a linear chain of tensors, where tensor is bonded to tensors and has an free index :

The bond indices can be thought of as characterizing the quantum correlations (read: entanglement) between the different components of the state. In a minimal representation (we’ll get to what exactly that means below), the dimension of those bonds (the number of values ranges over) is a rough measure of how much entanglement connects a particle to the rest of the system.

Working with the state in this form means that operators, the vast majority of which usually only deal with one or two particles in quantum computing, can operate on only the data associated to relevant particles. In a naive representation using a 1D state vector, a single-particle operator would need potentially many tensor products with identity operators on other particles to act on the full -particle state, which is wasteful in both runtime and memory. For an MPS, the operator acting on the -th particle needs to be represented on only the subspace of the -th particle–no need for any applications of the tensor product because the separation of all physical indices makes it unnecessary on the level of data structures in a classical simulation.

The Schmidt decomposition offers a way to transform an arbitrary quantum state into a MPS. The algorithm is as follows from a concrete perspective suited to a classical computer.

Interpret the state as a rank- tensor, with each axis corresponding to the relevant quantum number of a single particle, .
Reshape the tensor into a matrix with the first particle’s quantum numbers as rows and all other quantum numbers fused into the column index, .
Perform a SVD on , yielding . Store and .
Multiply the rows of by the elements of , yielding , and reshape to fuse with , .
Perform a SVD on , yielding . Reshape to un-fuse from , , and divide the slices of by the non-zero elements of to get . Store and .
Multiply the rows of by the elements of , yielding , and reshape to fuse with , .
Continue until a total of SVDs have been performed. can be computed from the last by dividing its rows by the non-zero elements of .

The total decomposition of the original state is then a MPS with vectors of scalars attached to each internal bond ():

These , as singular/Schmidt values of a matrix decomposition, store information about how inter-related the / are. Broadly, they act as weights on the different product states that you get from contracting a bond in the MPS. As different physical quantum numbers (i.e. ) are combined through the contraction, the different determine how much each product state contributes to the overall state on all .

There’s a direct analogue here to the compressibility (with is rather like the inverse of entropy) of images. If we treat an image simply as a matrix of numbers, we can similarly put it through a SVD, and the resulting singular values will describe how much certain combinations of row and column features contribute to the overall image. If there are many singular values that are all roughly the same value then image is likely to be very densely packed with visual complexity because it would require the equal contribution of many different kinds of features to reproduce and hence not be likely to shrink very much in size when fed through a compression algorithm.

Likewise in MPSs: If, on a particular bond, there are many Schmidt values with roughly equal value, then the quantum state is a sum over many different product states, which means there’s lots of entanglement (i.e. information) in the state. Conversely, though, if some singular values are zero, then when contracting the MPS into a single object, the corresponding product states will contribute nothing to the overall state and hence may be discarded with no impact to the fidelity of the MPS. Since the Schmidt decomposition/SVD is unique, this means we have a minimal, lossless representation of an arbitrary quantum state after the discards are performed. If we then start to discard non-zero singular values, we end up with a rigorous way to further reduce the representation at the cost of losing some information in the state, rather like image compression.

Once the MPS is in canonical form, though, the entanglement entropy across any bipartition can simply be read off from the relevant singular values. Although the runtime cost of factoring an arbitrary -qubit state is probably something like , we can note that pure product states have easy, fixed factorizations with

so if we limit initialization to product states, we can avoid that initialization cost.

The canonical form

Another advantage to the canonical form is that all tensors, when contracted through the relevant vectors, are orthonormal. This means that the following contractions are all equal to the identity (which can be considered as an empty wire in tensor network language):

Contraction	Diagram

(Here, we’re treating the a loosely and not flipping the order of indices, so you can think of as just the element-wise complex conjugation of .)

Hence, the expectation value of any single-particle operator simplifies to a contraction with only that particle’s tensor (and its conjugate) and either one or two vectors, depending on whether is at an end of the MPS:

and we get similar simplifications for computing partial traces over the state if we ever need them.

Circuit operations

There are three kinds of operations that we need to fit into this formalism to put it to work for quantum computing, where in this context, every particle modeled in the MPS is now a qubit:

single-qubit gates;
two-qubit gates;
randomized, site-resolved projective measurements.

Single-qubit gates are trivial–as long as they’re unitary (which they should be), they can be applied indiscriminately to any in the MPS in parallel while maintaining the proper normalization conditions that allow the above identities.

The latter two are a bit trickier. The reason is that both two-qubit gates and projective measurements are things that can affect the entanglement between particles, and hence need to modify the magnitude and/or number of singular values across some number of bonds; this means they have non-trivial action on a given Schmidt decomposition. Although simulators benefit from a MPS factorization when applying single-qubit gates, the inherent locality in the MPS works to our detriment when we want to effect behavior that extends beyond a single particle. These behaviors include things like generating entanglement through a multi-qubit gate or collapsing a multi-body wavefunction with a projective emasurement (which also acts on the tensors that the measured particle is entangled to).

Since the exact actions on the singular values and tensors are difficult to predict in general (hence the need for a simulator), we have to resort to a scheme where we local contract part of the MPS to produce a local multi-particle state, apply the two-qubit gate or projective measurement, and then re-factor those particles. For two-qubit gates, we need to contract and refactor every bond between the qubits that the gate acts on, and for measurement there’s no way to generally predict where entanglements lie, so this basically amounts to refactoring every bond in the state.

One important consequence of this is that the runtime of a circuit simulation has super-quadratic scaling with the exponential of the number of qubits, and additionally has explicit dependence on the measurement probability (in the case of a randomized circuit for measurement-induced phase transitions). When is small, the bond dimensions in the MPS grow larger than when is large, which means that more work has to be done for each measurement or gate application. To losslessly represent a completely random state in the -qubit Hilbert space, the maximum bond dimension occurs at the middle of the MPS at . Note that the SVD of a matrix of shape has runtime .

Wrapping up

So how well does all this work in practice? Well, everything here still suffers from the same exponential scaling as the naive approach–it’s been shown that any classical simulation of a completely general quantum system will share this as well. But the MPS representation allows us to carve out certain niches in the total space of states and operations where we can beat it. For everything else, MPSs pretty much break even with the naive approach.

Footnotes

↵ Here, we’ll simply think of a rank- tensor as an -dimensional array of numbers.
↵ The “fusion” relates the single index of a flattened vector to the indices of the original tensor via the tensor’s “strides”,

This aligns with how arrays are flattened and indices “unraveled” in e.g. NumPy.
↵ In a tensor network, a multi-tensor multiplication operation is encoded into a graph where nodes are tensors and edges are their indices. An edge connects two nodes if that index is summed over, and dangling edges (ones that are only connected to one node) are allowed.

Changelog

Originally posted 2024.07.31
Updated in Typst reformat 2026.03.18 with minor wording adjustments