The next participant is the application program. Currently, when an application programmer develops an algorithm, he or she passes to the compiler a great detail of information which is not only irrelevant, but is also damaging to the efforts of obtaining good performance. For example, in conventional languages like C or Fortran, the sequential ordering of instructions usually results in unnecessary and spurious data dependencies from one instruction to the next. Also, application programmers are notorious for introducing extraneous details like loop unrolling (Section 2.2.2) in order to obtain better performance. Later, when a computer is replaced by a newer computer, the relative performance of the same program suffers as a result of these extra details.
We believe that ``an application program'' should actually be a suite of many different, best-known algorithms that cover a wide range situations. For example, some algorithms would be targeted to a SIMD machine with a two-dimensional mesh. In addition, separate algorithms would exist depending on bits of precision, register availability, and the number of floating point adders. We propose that the algorithms be written in an English-like format with dependencies, suggestions, and shortcuts stated explicitly.
As the application program is downloaded to a particular machine, it should be sent to the system software for extraction of those algorithms suited to the particular hardware. At this point, much of the software it omitted due to incompatible topological or timing or precision requirements. However, the subset of algorithms that remain can confidently believed to be the currently ``best known'' approaches to meeting certain goals within the concession/requirements for this particular machine.
After the remaining algorithms have been compiled, they would be individually tagged by the system software with timing and error estimates. When an end user submits a contract, it can be processed at submission time against this tagged information to locate the appropriate algorithm (if one exists at all) or to determine that the contract is not feasible. When a contract is not acceptable, alternative contract stipulations should be offered to the end user for consideration.
So what should an application program legitimately specify in its ``contract''? The answer is: whatever it needs from the compiler to satisfy the goals set by the end user. The application divides the end user's goals into subtasks, which sounds like top-down design but is something very different. The end user's goals involve time and accuracy. The application program specifies an algorithm, with guaranteed performance at every step. That algorithm can be constructed by top-down, bottom-up, or whatever method one likes, so long as the final result can be read by a human as a proof of the achievability of the overall goal. The best way is by example and is given in Section 4.2.
There are two important features in this particular part of the contract that are not found in other approaches. One is to place a bound on the speed, latency, and precision of every data item. The second is to express as much parallelism as possible, understanding that there is generally more parallelism in the data dependency of an algorithm than can be exploited by a compiler. The compiler's task is now discarding parallelism instead of trying to derive it.
These two details are the ones compilers are currently burdened with guessing about, and programmers have avoided specifying these things. So traditional programming languages tell the compiler extraneous information that it must work to ignore (like spurious sequential ordering of operations) and deny it essential information that it must work to guess or derive ( e.g. which variables belong in registers).
Another useful piece of information the application programmer should supply is branch probability. Any time a conditional statement is made, some indication should be given about which way it goes most of the time, or that no such prediction can be made. Similarly, minimum, average, and maximum expected passes through a loop should be specified. Then, in addition to maximum execution time, average execution time could also be predicted. This example illustrates why this paradigm is new. Current paradigms insist that the compiler or other tools somehow derive these probabilities and bounds. It is little burden for the application programmer to specify this information. Often, it is the case that the programmer knows this information far better than the tools will ever be able to derive.
If the application programmer declines to supply this information, then the compiler is not obligated to be ``clever'' or go beyond what is patently explained in the program. The time for a sequence of operations is the sum of their times. The time for parallel operations varies with available parallelism in the hardware. The trade-off between porting to a wide number of computers and asking for the maximum performance is made explicit.