next up previous contents
Next: Word Size Up: Failures of the Previous: Uncertain Accuracy

Loop Unrolling

  In the 1980s' LINPACK benchmark report [4], there is the annotation ``(rolled BLAS)'' after many of the numbers in the table. What does this cryptic comment mean? Well, the BLAS are Basic Linear Algebra Subroutines [5] like dot product and vector scaling for which the LINPACK benchmark supplies Fortran subroutines. The loop operation is given two pages of Fortran with the loop unrolled 2-fold, 4-fold, 8-fold, and 16-fold. A portion of this code is given in Figure 1. Unrolling is done to avoid the overhead of loop management. In more recent machines and compilers, there is a higher performance penalty associated with unrolled code than there is with rolled code. So people ``roll'' the unrolled loop back up. Then LINPACK goes faster.

  
Figure 1: Unrolled LINPACK Code

What went wrong in these events? The application programmer was guessing both compiler and hardware behavior. What the programmer wanted to accomplish was to multiply a row of a matrix by a constant and then to add the result to another row. Conventional Fortran forces the programmer to write a ``DO'' loop to perform this sequentially, even though the order of the i subscripts does not matter. There is plenty of parallelism in the operation and opportunity for vector arithmetic also. Compilers for vector and parallel computers have to do some extra work figuring out if the serial ordering the programmer specified is safe to ignore. All the unrolling does is make that analysis more difficult.



Dr. T. L. Marchioro II
Wed Aug 9 16:54:08 CDT 1995