Elements of Design
We can summarize the attributes of good parallel algorithm design as
- low operation count, where operation count is defined as the sum of operators and operand accesses
- minimal operand movement
- minimal resource contention
Item #1 is well-known by theoretical computer scientists.
Item #2 is well-known among high-performance algorithm designers.
Item #3 is well-known among hardware designers and computer engineers.
When designing domain flow algorithms, we are looking for an energy efficient embedding of a computational graph in space, and it is thus to be expected that we need to combine all three attributes of minimizing operator count, operand movement, and resource contention. The complexity of minimizing resource contention is what makes hardware design so much more complex. But the complexity of operator contention can be mitigated by clever resource contention management.
The Stored Program Machine (SPM) is an example of a specific resource contention management mechanism designed specifically for reusing one ALU for the complete computational graph. The SPM protocol to avoid contention on that one processing resource is described as follows:
- read an instruction from memory
- decode the instruction
- fetch the input operands from memory
- execute the instruction using the input operands
- write the result back to memory
This protocol makes the Stored Program Machine energy inefficient. Most of the transistors in the pipeline are allocated to reading and decoding meta information to organize contention free use of the ALU. This resource contention management is over-constrained as it forces a total order on the computation graph. This tasks of creating the total order falls on the algorithm designer.
For parallel execution we need a resource contention management mechanism that is more efficient. And this is where our computational spacetime will come in handy.