\(\def\mymacro{{\mathbf{\alpha,\beta,\gamma}}}\)
\(\def\va{{\mathbf{a}}}\)
\(\def\vb{{\mathbf{b}}}\)
\(\def\vc{{\mathbf{c}}}\)
\(\def\vd{{\mathbf{d}}}\)
\(\def\ve{{\mathbf{e}}}\)
\(\def\vf{{\mathbf{f}}}\)
\(\def\vg{{\mathbf{g}}}\)
\(\def\vh{{\mathbf{h}}}\)
\(\def\vi{{\mathbf{i}}}\)
\(\def\vj{{\mathbf{j}}}\)
\(\def\vk{{\mathbf{k}}}\)
\(\def\vl{{\mathbf{l}}}\)
\(\def\vm{{\mathbf{m}}}\)
\(\def\vn{{\mathbf{n}}}\)
\(\def\vo{{\mathbf{o}}}\)
\(\def\vp{{\mathbf{p}}}\)
\(\def\vq{{\mathbf{q}}}\)
\(\def\vr{{\mathbf{r}}}\)
\(\def\vs{{\mathbf{s}}}\)
\(\def\vt{{\mathbf{t}}}\)
\(\def\vu{{\mathbf{u}}}\)
\(\def\vv{{\mathbf{v}}}\)
\(\def\vw{{\mathbf{w}}}\)
\(\def\vx{{\mathbf{x}}}\)
\(\def\vy{{\mathbf{y}}}\)
\(\def\vz{{\mathbf{z}}}\)
\(\def\vmu{{\mathbf{\mu}}}\)
\(\def\vtheta{{\mathbf{\theta}}}\)
\(\def\vzero{{\mathbf{0}}}\)
\(\def\vone{{\mathbf{1}}}\)
\(\def\vell{{\mathbf{\ell}}}\)
\(\def\mA{{\mathbf{A}}}\)
\(\def\mB{{\mathbf{B}}}\)
\(\def\mC{{\mathbf{C}}}\)
\(\def\mD{{\mathbf{D}}}\)
\(\def\mE{{\mathbf{E}}}\)
\(\def\mF{{\mathbf{F}}}\)
\(\def\mG{{\mathbf{G}}}\)
\(\def\mH{{\mathbf{H}}}\)
\(\def\mI{{\mathbf{I}}}\)
\(\def\mJ{{\mathbf{J}}}\)
\(\def\mK{{\mathbf{K}}}\)
\(\def\mL{{\mathbf{L}}}\)
\(\def\mM{{\mathbf{M}}}\)
\(\def\mN{{\mathbf{N}}}\)
\(\def\mO{{\mathbf{O}}}\)
\(\def\mP{{\mathbf{P}}}\)
\(\def\mQ{{\mathbf{Q}}}\)
\(\def\mR{{\mathbf{R}}}\)
\(\def\mS{{\mathbf{S}}}\)
\(\def\mT{{\mathbf{T}}}\)
\(\def\mU{{\mathbf{U}}}\)
\(\def\mV{{\mathbf{V}}}\)
\(\def\mW{{\mathbf{W}}}\)
\(\def\mX{{\mathbf{X}}}\)
\(\def\mY{{\mathbf{Y}}}\)
\(\def\mZ{{\mathbf{Z}}}\)
\(\def\mStilde{\mathbf{\tilde{\mS}}}\)
\(\def\mGtilde{\mathbf{\tilde{\mG}}}\)
\(\def\mGoverline{{\mathbf{\overline{G}}}}\)
\(\def\mBeta{{\mathbf{\beta}}}\)
\(\def\mPhi{{\mathbf{\Phi}}}\)
\(\def\mLambda{{\mathbf{\Lambda}}}\)
\(\def\mSigma{{\mathbf{\Sigma}}}\)
\(\def\gA{{\mathcal{A}}}\)
\(\def\gB{{\mathcal{B}}}\)
\(\def\gC{{\mathcal{C}}}\)
\(\def\gD{{\mathcal{D}}}\)
\(\def\gE{{\mathcal{E}}}\)
\(\def\gF{{\mathcal{F}}}\)
\(\def\gG{{\mathcal{G}}}\)
\(\def\gH{{\mathcal{H}}}\)
\(\def\gI{{\mathcal{I}}}\)
\(\def\gJ{{\mathcal{J}}}\)
\(\def\gK{{\mathcal{K}}}\)
\(\def\gL{{\mathcal{L}}}\)
\(\def\gM{{\mathcal{M}}}\)
\(\def\gN{{\mathcal{N}}}\)
\(\def\gO{{\mathcal{O}}}\)
\(\def\gP{{\mathcal{P}}}\)
\(\def\gQ{{\mathcal{Q}}}\)
\(\def\gR{{\mathcal{R}}}\)
\(\def\gS{{\mathcal{S}}}\)
\(\def\gT{{\mathcal{T}}}\)
\(\def\gU{{\mathcal{U}}}\)
\(\def\gV{{\mathcal{V}}}\)
\(\def\gW{{\mathcal{W}}}\)
\(\def\gX{{\mathcal{X}}}\)
\(\def\gY{{\mathcal{Y}}}\)
\(\def\gZ{{\mathcal{Z}}}\)
\(\def\sA{{\mathbb{A}}}\)
\(\def\sB{{\mathbb{B}}}\)
\(\def\sC{{\mathbb{C}}}\)
\(\def\sD{{\mathbb{D}}}\)
\(\def\sF{{\mathbb{F}}}\)
\(\def\sG{{\mathbb{G}}}\)
\(\def\sH{{\mathbb{H}}}\)
\(\def\sI{{\mathbb{I}}}\)
\(\def\sJ{{\mathbb{J}}}\)
\(\def\sK{{\mathbb{K}}}\)
\(\def\sL{{\mathbb{L}}}\)
\(\def\sM{{\mathbb{M}}}\)
\(\def\sN{{\mathbb{N}}}\)
\(\def\sO{{\mathbb{O}}}\)
\(\def\sP{{\mathbb{P}}}\)
\(\def\sQ{{\mathbb{Q}}}\)
\(\def\sR{{\mathbb{R}}}\)
\(\def\sS{{\mathbb{S}}}\)
\(\def\sT{{\mathbb{T}}}\)
\(\def\sU{{\mathbb{U}}}\)
\(\def\sV{{\mathbb{V}}}\)
\(\def\sW{{\mathbb{W}}}\)
\(\def\sX{{\mathbb{X}}}\)
\(\def\sY{{\mathbb{Y}}}\)
\(\def\sZ{{\mathbb{Z}}}\)
\(\def\E{{\mathbb{E}}}\)
\(\def\jac{{\mathbf{\mathrm{J}}}}\)
\(\def\argmax{{\mathop{\mathrm{arg}\,\mathrm{max}}}}\)
\(\def\argmin{{\mathop{\mathrm{arg}\,\mathrm{min}}}}\)
\(\def\Tr{{\mathop{\mathrm{Tr}}}}\)
\(\def\diag{{\mathop{\mathrm{diag}}}}\)
\(\def\vec{{\mathop{\mathrm{vec}}}}\)

Expanding einsum expressions

Table of Contents

You can grab the code as a python script here.

einsum is great! You can conveniently express common linear algebra operations like matrix-vector products, and more complicated tensor contractions. It helps me to write more readable code, especially in the context of machine learning.

An einsum expression consists of two ingredients: an equation that specifies the contraction, and the input tensors.

As an example, let A, x be a matrix and a vector of appropriate size. Then, the matrix-vector product y = A @ x is

y = einsum("ik,k->i", A, x)

What if A was formed by two other matrices B, C via A = B @ C? We could write

A = einsum("ij,jk->ik", B, C)

and expand the einsum for y into

y = einsum("ij,jk,k->i", B, C, x)

Doing that takes mental effort. And there are additional challenges not covered by this example, like inserting an expression whose indices are already used.

To automate such einsum expansions, I'll code an einsum_expand helper function in this post. For the above example, it should work as follows:

A = ["ij,jk->ik", B, C]
y = ["ik,k->i", A, x]

einsum_expand(y) # -> ["ij,jk,k->i", B, C, x] (or equivalent)

Let's get started! (Imports first)

import re
from torch import rand, einsum, allclose, manual_seed, Tensor
from opt_einsum import contract, contract_expression
from typing import List
from timeit import repeat


manual_seed(0)

Motivation

First, some thoughts how such a function could be useful.

Say we want to evaluate a nested einsum expression without our einsum_expand helper. We could do one of the following:

  1. Do the expansion by hand: For instance, we could draw the expanded tensor network, introduce indices, and read off the contraction. But this does not scale too large expressions.
  2. Naively evaluate all unnested expressions with einsum: Iteratively, this will reduce the nesting until we're left with a single unnested expression. This constrains execution order. But this order is important for performance! Contraction optimization packages like opt_einsum can find the best contraction schedule and greatly improve performance. Check out this opt_einsum demo.

einsum_expand scales to arbitrarily complicated nested einsum expressions and in contrast to naive evaluation, its output can be optimized by opt_einsum and computed faster in many scenarios.

Introduction

Toy example

I will use a slightly more involved example: Let A, B, C, D, E be matrices of matching dimensions and A = B @ C where C = D @ E:

B = rand(10, 5)
D = rand(5, 20)
E = rand(20, 30)

Let's first compute A naively with the following two einsum expressions:

C = einsum("ij,jk->ik", D, E)
A_naive = einsum("ij,jk->ik", B, C)

Combining the expressions into a single one, A = B @ D @ E,

A_expanded = einsum("ij,jk,kl->il", B, D, E)

yields the same result:

print(allclose(A_naive, A_expanded))
True

Naive evaluation

I claimed that expanding einsum expressions can lead to speed-ups when combined with contraction optimizers. Here is the naive evaluation scheme that we will use for comparison later.

def is_tensor(x):
    return isinstance(x, Tensor)


def is_nested(x):
    return isinstance(x, (tuple, list))


def is_equation(x):
    return isinstance(x, str)


def einsum_nested_naive(expression, optimize=False) -> Tensor:
    """Naively evaluate a nested einsum expression.

    Evaluate unnested expressions until no sub-expressions remains.

    Args:
        expression: List describing a (potentially nested) einsum expression.
            The head is an equation, and the tail consists of tensors or more
            einsum expressions.
        optimize: If ``True``, use ``opt_einsum.contract`` to evaluate unnested
            expressions. Otherwise, use ``torch.einsum``. Default: ``False``.

    Returns:
        Result tensor.
    """
    equation, operands = expression[0], expression[1:]

    if not is_equation(equation):
        raise ValueError(f"Invalid equation as first entry: {equation}.")

    operands_flat = []

    for op in operands:
        if is_tensor(op):
            operands_flat.append(op)
        elif is_nested(op):
            operands_flat.append(einsum_nested_naive(op, optimize=optimize))
        else:
            raise ValueError(f"Expect Tensor or einsum expression, got {op}.")

    contraction_func = contract if optimize else einsum

    return contraction_func(equation, *operands_flat)

Let's verify it on our toy example:

A = [
    "ij,jk->ik",
    B,
    ["ij,jk->ik", D, E],
]
A_nested_naive = einsum_nested_naive(A)

print(allclose(A_nested_naive, A_naive))
True

Helper functions

Working with einsum equations

To insert one sub-expression, we have to replace its output indices by group of input indices. For that we need to be able to split equations into per-operand groups and combine them back into one string. This is done by the following helpers:

def equation_to_groups(equation: str) -> List[str]:
    return re.split(",|->", equation)


def groups_to_equation(groups: List[str]) -> str:
    return "->".join([",".join(groups[:-1]), groups[-1]])

They are best understood by examples:

equation = "ij,jk,kl->il"
groups = equation_to_groups(equation)

print(f"Groups: {groups}")
print(f"Equation from groups: '{groups_to_equation(groups)}'")
Groups: ['ij', 'jk', 'kl', 'il']
Equation from groups: 'ij,jk,kl->il'

Renaming indices

The aforementioned replacement operation has to respect additional rules: As we're introducing the input group's indices, we need to make sure their names are not already occupied. We also need to match the output indices of the equation we want to insert.

Let's walk through our example to understand some issues we might encounter:

A = ["ij,jk->ik", B ["ij,jk->ik", C, D]]

We want to replace jk in A[0] by the sub-expression A[-1].

The latter currently has output indices ik. So we need to rename i into j. But j is already occupied by a summation index, so we need to rename that first. This will lead to a rename of

"ij,jk->ik"

into

"ja,ak->jk"

We can now safely expand the renamed

A = ["ij,jk->ik", B ["ja,ak->jk", C, D]]

into

["ij,ja,ak->ik", B, C, D]

In the last step we could have encountered another problem: If a was already taken in A[0], we would have to do another rename.

The renaming is done by the following helper rename_out_indices:

def rename_out_indices(equation, new_name, blocked=None):
    """Rename output indices into new_name, respecting blocked indices."""
    out_indices = equation_to_groups(equation)[-1]

    blocked = set() if blocked is None else blocked.copy()
    blocked = blocked.union(set(new_name)).union(set(out_indices))

    equation = _rename_indices(equation, blocked=blocked)

    out_indices = equation_to_groups(equation)[-1]

    for out_idx, new_idx in zip(out_indices, new_name):
        equation = equation.replace(out_idx, new_idx)

    return equation


def _rename_indices(equation, blocked=None):
    """Rename indices of an equation if they are blocked."""
    groups = equation_to_groups(equation)
    indices = set("".join(groups))

    blocked = set() if blocked is None else blocked.copy()
    blocked = blocked.union(indices)

    rename_indices = indices.intersection(blocked)

    for idx, new_name in zip(rename_indices, _get_free_characters(blocked=blocked)):
        equation = equation.replace(idx, new_name)

    return equation


def _get_free_characters(blocked=None):
    """Yield characters that are not in blocked, starting with a."""
    blocked = set() if blocked is None else blocked

    first = ord("a")
    shift = 0

    while True:
        char = chr(first + shift)
        if char not in blocked:
            yield char
        shift += 1

Here is how it acts on our example:

print(rename_out_indices("ij,jk->ik", "jk"))
ja,ak->jk

We can block certain letters from being used, too:

print(rename_out_indices("ij,jk->ik", "jk", blocked={"c"}))
ja,ak->jk

Inserting at a given position

We now have all the functionality to correctly rename and insert an expression.

def insert_at(expression, pos):
    """In-place insert the sub-expression at the given position."""
    equation, operands = expression[0], expression[1:]
    groups = equation_to_groups(equation)
    blocked = set("".join(groups))

    inner_equation, inner_operands = operands[pos][0], list(operands[pos][1:])
    inner_equation = rename_out_indices(inner_equation, groups[pos], blocked=blocked)
    inner_groups = equation_to_groups(inner_equation)

    out_operands = operands[:pos] + inner_operands + operands[pos + 1 :]
    out_groups = groups[:pos] + inner_groups[:-1] + groups[pos + 1 :]
    out_equation = groups_to_equation(out_groups)

    return [out_equation] + out_operands

Here are some examples:

  • Inserting an unnested sub-expression

    # strings instead of tensors for more readable printing
    expression = ["ij,jk->ik", "B", ["ij,jk->ik", "D", "E"]]
    
    print(insert_at(expression, 1))
    
    ['ij,ja,ak->ik', 'B', 'D', 'E']
    
  • Inserting a nested sub-expression:

    # strings instead of tensors for more readable printing
    expression = ["ij,jk->ik", "B", ["ij,jk->ik", ["ab,bk->ak", "D", "E"], "F"]]
    
    print(insert_at(expression, 1))
    print(insert_at(insert_at(expression, 1), 1))
    
    ['ij,ja,ak->ik', 'B', ['ab,bk->ak', 'D', 'E'], 'F']
    ['ij,jd,da,ak->ik', 'B', 'D', 'E', 'F']
    

Putting it together

Here is the einsum_expand helper I promised in the beginning:

def einsum_expand(expression):
    """Expand a nested list of ``einsum`` expressions into a single one.

    An ``einsum`` expression is defined via a tuple/list ``(equation, *operands)``
    where operands are either ``Tensor``s or ``einsum`` expressions.

    Args:
        expression: List describing a (potentially nested) einsum expression.
            The head is an equation, and the tail consists of tensors or more
            einsum expressions.

    Returns:
        An unnested expression consisting of an equation and its operands.
    """
    expression = list(expression)

    def expandable_pos(expression):
        for pos, subexpression in enumerate(expression[1:]):
            if is_nested(subexpression):
                return pos
        raise ValueError("No expandable expression found.")

    while any(is_nested(subexpression) for subexpression in expression[1:]):
        pos = expandable_pos(expression)
        expression = insert_at(expression, pos)

    return expression

Let's try it out on our toy example. We compare the result of naively contracting the nested expression with contracting the expanded expression with a usual einsum:

# einsum_expand handles list- and tuple-valued expressions
A = ("ij,jk->ik", B, ["ij,jk->ik", D, E])

A_naive = einsum_nested_naive(A)
A_expanded = einsum(*einsum_expand(A))

print(allclose(A_naive, A_expanded))
True

Use case

einsum_expand can be useful in combination with contraction optimizers like opt_einsum. To demonstrate this, we consider a modified version of this opt_einsum demo. But I replace the matrix C by the matrix product A @ B to make the expression nested:

dim = 30
inner_dim = dim // 10

I = rand(dim, dim, dim, dim)
A = rand(dim, inner_dim)
B = rand(inner_dim, dim)
C = ["ij,jk->ik", A, B]

expression = ["pi,qj,ijkl,rk,sl->pqrs", C, C, I, C, C]

We will benchmark different combinations of contraction implementations (opt_einsum.contract versus torch.einsum) and evaluation schedules (naive versus expand). Here are their definitions:

def naive_torch():
    """Naively evaluate with `torch.einsum`."""
    return einsum_nested_naive(expression, optimize=False)


def naive_opt_einsum():
    """Naively evaluate with `opt_einsum.contract`."""
    return einsum_nested_naive(expression, optimize=True)


# NOTE Left out for faster execution
# def expand_torch():
#     """Expand expression then evaluate with `torch.einsum`."""
#     return einsum(*einsum_expand(expression))


def expand_opt_einsum():
    """Expand expression then evaluate with `opt_einsum.contract`."""
    return contract(*einsum_expand(expression))

Let's make sure they produce the same result:

print(
    allclose(naive_torch(), naive_opt_einsum())
    # NOTE Left out for faster execution
    # and allclose(naive_torch(), expand_torch())
    and allclose(naive_torch(), expand_opt_einsum())
)
True

Let's compare their performance:

REPEAT, NUMBER = 10, 20

naive_torch_time = min(repeat(naive_torch, repeat=REPEAT, number=NUMBER))
# NOTE Left out for faster execution
# expand_torch_time = min(repeat(expand_torch, repeat=REPEAT, number=NUMBER))
naive_opt_einsum_time = min(repeat(naive_opt_einsum, repeat=REPEAT, number=NUMBER))
expand_opt_einsum_time = min(repeat(expand_opt_einsum, repeat=REPEAT, number=NUMBER))

best_overall = min(
    naive_torch_time,
    # NOTE Left out for faster execution
    # expand_torch_time,
    naive_opt_einsum_time,
    expand_opt_einsum_time,
)

print(
    f"Naive  + torch.einsum        : {naive_torch_time:.5f} s "
    + f"(x {naive_torch_time / best_overall:.2f})"
)
# NOTE Left out for faster execution
# print(
#     f"Expand + torch.einsum        : {expand_torch_time:.5f} s "
#     + f"(x {expand_torch_time / best_overall:.2f})"
# )
print(
    f"Naive  + opt_einsum.contract : {naive_opt_einsum_time:.5f} s "
    + f"(x {naive_opt_einsum_time / best_overall:.2f})"
)
print(
    f"Expand + opt_einsum.contract : {expand_opt_einsum_time:.5f} s "
    + f"(x {expand_opt_einsum_time / best_overall:.2f})"
)
Naive  + torch.einsum        : 0.20648 s (x 4.97)
Naive  + opt_einsum.contract : 0.07125 s (x 1.71)
Expand + opt_einsum.contract : 0.04157 s (x 1.00)

We can see that combining einsum_expand with opt_einsum performs best.

That's all for now. Enjoy!

Author: Felix Dangel

Created: 2022-04-02 Sat 15:28

Validate