Free Schedule
We alluded to the fact that inherently-parallel algorithms exhibit some partial order, and not a total order, because the instructions that can execute independently do not have any explicit order among each other. This extra degree of freedom is another benefit domain flow algorithms exhibit over sequential algorithms. It allows the execution engine to organize any resource contention in a more energy, space, or time efficient way, as long the machine does not violate the dependency relationships specified in the algorithm.
Typically, the complete order defined by sequential algorithms over-constrains the execution order, and parallelizing compilers can’t recover the inherent dependency structure of the mathematics behind the algorithm, leading to disappointing speed-ups. This is a fundamental limitation to trying to generate parallel execution models from sequential specifications. Secondly, as we’ll see shortly, the best parallel algorithms may organize their computation differently as compared to a sequential algorithm. There are even cases where the parallel algorithm is better off using a different mathematical basis for its solution to reduce operand movement. These are so-called communication-avoiding parallel algorithms (Hoemmen). Other approaches, such as, Iso Geometric Analysis (IGA) in next generation finite element and finite volume methods use higher-order methods to increase the computation to operand bandwidth ratio of the compute kernels thus reducing the data movement requirements of the execution (G+SMO).
Instead, the domain flow specification only specifies the data dependencies inherent to the algorithm: the partial order will evolve from these constraints. In other words, it is an emergent behavior. If we present a domain flow algorithm with infinite resources and instantaneous access to its inputs, then the computational activity of the specification would evolve in what is called the free schedule.
The free schedule for our matrix multiply is visualized in the following, interactive, simulation:
We see the activity wavefront of the $a$ recurrence (blue), the $b$ recurrence (purple), and the $c$ recurrence (red) evolve through space and time.
The $a$ recurrence is defined by the recurrence equation: $a: a[i,j-1,k]$ is independent of both $b$ and $c$. The computational wavefront represents the computational event set: $a[i,j,k] = a[i,j-1,k]$ and will evolve along the $[0,1,0]$ direction.
Similarly, the $b$ recurrence, defined by the equation: $b: b[i-1,j,k]$ is independent of $a$ and $c$ and the computational wavefront evolves along the $[1,0,0]$ direction.
The $c$ recurrence, however, does depend on both $a$ and $b$, as well as on its own previous values. The free schedule that the $c$ recurrence evolves into is a wavefront that moves along the $[1,1,1]$ direction. The $a$ and $b$ values will arrive ’early’ at a $[i,j,k]$ lattice location, and as the $c$ values arrive, the recurrence equation for $c$, shown below, will trigger:
$c: c[i,j,k-1] + a[i,j-1,k] * b[i-1, j, k]$