Python Einstein Summation - Mihai Nica's Notes

Often when doing linear-algebra-adjacent things, we end up with sums that "sum out" an index. For example, when doing matrix multiplication to update the probability vector by multiplying by the [[How do I use a Probability Transition Matrix?| Transition Matrix]] we are doing the sum over all states, and this is equivalent to a vector-matrix product: $ \sum_{a=1}^n p_a M_{ab} = (\vec{p}M)_b $ The "a" index is summed over, and the final answer depends only on the "b" index here. There is a convention called the **Einstein summation convention** (which was originally introduced in physics) that you can skip writing the sum notation $\Sigma$ when this happens. So we could equally well write: $ p_a M_{ab} = (\vec{p}M)_b$ and the fact that $a$ is repeated on the RHS means we are summing over the $a$ index. This is implented in python using the ```np.einsum``` command. The syntax is as follows: - Specify the sequence of indices in quotes. In our example this is ``` 'a,ab -> b' ``` which represents the fact we are summing out the ```a``` index. - Put in the arrays you are using in order. So in our case, we wrote ```a,ab``` which means we should put in $p$ first and then $M$ second. The full example looks like: ```python np.einsum('a,ab->b',p,M) ``` In this case this is completly equivalent to computing the matrix vector product. However, there are many situations where writing out the einsum is extra helpful ## Extra axes you want to ignore Sometimes you have extra axes you want to ignore/parralize over, and by doing an einsum you can easily specify which you want to do. For example, if instead of a single probability vector $\vec{p}$, we had an array ```p_with_t[t,a]``` representing a different probability vector for each value of $t$ , we could effecively tell the computer to ignore the $t$ axis by doing: ```python np.einsum('ta,ab->tb',p_with_t,M) ``` This would tell the computer that the output will be something that has a $t$ axis too and we are doing the sum $\sum_{a=1}^n p_{t,a} M_{a,b}$ ## State space is more than just a single axes Imagine that if the state space was indexed by two axes so a single state $s=(x,y)$ is a collection of two indices. You can do sums in an easy to read way using einsum now: ```python np.einsum('xy,xywz->wz',p_2d,M_2d) ``` Note that in this case the transition matrix is indexed by a pair of coordinates $(x,y),(w,z)$ : $M_{(x,y);(w,z)} = \mathbb{P}(S_1 = (w,z) | S_0 = (x,y)) $