BLAS Level 3
BLAS Level 3 are $\mathcal{O}(N^3)$ operators, and finally compute bound creating many opportunities to optimize operand reuse.
In addition to matrix-matrix multiply there are the Rank-k update operators, which are outer-products and matrix additions.
Here is a Hermitian Rank-k update:
$$ C = \alpha A A^T + \beta C, \space where \space C \space is \space Hermitian. $$A Hermitian matrix is defined as a matrix that is equal to its Hermitian conjugate. In other words, the matrix $C$ is Hermitian if and only if $C = C^H$. Obviously a Hermitian matrix must be square. Hermitian matrices can be understood as the complex extension of real symmetric matrices.
Matrix-matrix multiply
We have seen this algorithm in the introduction:
compute ( (i,j,k) | 1 <= i,j,k <= N ) {
a: a[i,j-1,k]
b: b[i-1,j,k]
c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k]
}
Hermitian Rank-k update
Starting from a single matrix, generating the transpose is a very expensive operator. For simplicity, let’s start with the computational phase $\alpha A A^T$.
compute ( (i,j,k) | 1 <= i,j,k <= N ) {
a: a[i,j-1,k]
at: at[i-1,j,k]
if k = 1 {
c: beta[i,j,k-1] * c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k]
} else {
c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k]
}
}
Here we introduce a conditional constraint that impacts the domain of computation for a set of equations.