Ames Laboratory, Department of Energy, Ames, Iowa
next previous

High Performance Computing Systems Research

KC-07-01-02.

20c. PURPOSE

20e. APPROACH

Key Personnel - FY1996 S. Elbert (PI) (0.7 FTE); J. Gustafson (PI) (0.2 FTE); D. Heller (PI) (1.0 FTE)

20f. TECHNICAL PROGRESS

The HINT performance analysis tool is the only program to have successfully benchmarked computers spanning the entire range of existing speeds and architectures. HINT establishes a performance metric firmly grounded in physical and information-theoretic fundamentals. This work won a prestigious R&D 100 award in 1995. This is the second R&D 100 award this research project has won, the earlier one being in 1991 for SLALOM, the precursor of HINT.

Acquisitions that specify HINT performance requirements have begun to appear -- Fermilab being the first Department of Energy laboratory to use it in a major acquisition. Efforts to commercialize HINT have begun.

Preliminary results indicate that the HINT performance of a computer is predictable from basic parameters such as clock speed, cache and memory size and latency, and precision. It now appears to be possible to automatically determine the optimum amount of installed memory.

The HINT database has been expanded and automated. Interactive access to the database is now available over the World Wide Web. This mechanism allows Web users to call up and compare the HINT performance data and graphs by selecting the systems of interest from a large and growing list of systems.

A GUI version of HINT for Macintosh computers, which will be extended to Windows, was created to increase interactivity in selecting tunable values in the HINT driver such as the number of retries and the step size as the problem size is increased. This facilitates the discovery of ways to obtain performance curves in less time with no sacrifice in smoothness or accuracy.

A paper-and-pencil version of HINT that can measure the computational performance of humans was created to explain the operation of HINT and educate people on the relative performance of humans versus computers. Copies have been distributed to the Adventures in Supercomputing program.

The design principles of HINT were applied to the measurement of network performance. By scaling the amount of data communicated as a function of time, we obtained measurement curves for different network protocols and hardware connection types. The network version of HINT is called NetPIPE. As described in the paper on NetPIPE that has been accepted for presentation and publication, some startling results for ATM communications were found. There is a catastrophic drop in point-to-point transfer speed in ATM for a certain range of message sizes. Previous studies have missed this fact. The effect is profound (more than an order of magnitude performance drop) and highly repeatable. NetPIPE, like HINT, is very portable and broad-spectrum, permitting one to compare different software protocols as well as different types of physical interconnection. CEBAF has expressed interest in this tool to assure adequate storage bandwidth. NASA's Ames Research Center has also expressed interest. They believe it will be more meaningful than existing TTCP tools.

A suite of system diagnostic programs for the nCUBE/2 was developed that are often more useful and faster than the vendor supplied diagnostic suite. Because of its flexible memory access control, a variety of experiments evaluating the effectiveness of different choices were made which helped understand the theoretical performance of HINT.

In addition to the R&D 100 Award, this work has produced 13 publications.

20g. FUTURE ACCOMPLISHMENTS

HINT

The HINT project will proceed in two main directions: First, we will gather data about real application performance and look for correlation with Net QUIPS values to test the hypothesis that HINT is a reliable predictor of relative machine ranking across a wide spectrum of tasks.

Second, the NetPIPE project will proceed in the footsteps of HINTÉ gathering more data, placing it on the World-Wide Web. We do not have a complete understanding of why the curves of the metric show the behavior they do; an analytic model like the one for HINT is needed, but will probably be much more difficult to create. If we can create it, it will have uses similar to that of analytic HINT, such as direct information to hardware designers that will enable better communication system engineering.

There will also be testing of HINT in combination with an entirely new direction, explained in the following section.

Application Signatures

Just as HINT characterizes a computer system independent of the application, we conjecture that there is a corresponding way to characterize an application that is independent of the computer system. The latter we refer to as the "application signature" for now. Our current thinking is that it can be found by examining periodicity in memory reference patterns as a spectrum. That is, one could use logic probes to the memory bus of a representative single-bus computer running an application, and an FFT performed on the digital data to obtain a power spectrum. We expect to work with Charles Wright (ISU Computer Engineering Department) in obtaining application signatures by this method, and seeing if they possess qualities that are independent of the hardware platform.

If we are successful, we will have decoupled the performance prediction problem into a set of curves that describe the computing systems and a set of curves that describe the applications to be run. Predicting performance will be as simple as multiplying the curves together (or looking for overlap using the same time scale on the horizontal axis). This will move the "architecture-application fit" problem from an art to a science, if our conjecture is correct. The conjecture rests heavily on the premise that memory (bandwidth, size, latency) now dominates computer performance and not arithmetic speed.

Assuming the "application signature" idea proves sound, we will begin collecting signatures for such diverse tasks as word processing, computational fluid dynamics, transaction processing, Monte Carlo simulations, and other main consumers of computing power. These signatures can then be used as a reference for computer designers who can use HINT to optimize their designs for particular uses. We will promote a system by which "grand challenge" users can communicate their needs via application signatures, so that high-speed computers do not put undue emphasis on hardware features (such as nominal TFLOPS) that do not translate into actual performance for their problem types.

Performance Tools for Extended Use

The goal of this work is to develop standard tools for measurement of program behavior, integrating self-measurement with application programs, and analysis of prior behavior with current information. This will help to verify predicted behavior and develop models of actual behavior and thus allow for improved algorithm selection. It will be possible to capture the essence of unexpected behavior, and monitor trends via historical and statistical information.

The life cycle of a program can be summarized as:

PhaseCurrent Situation
designa priori analysis
developmentthorough instrumentation
testingpartial instrumentation
releaseno instrumentation!
redesign, generalizationincomplete information

The goal is to change the habit of releasing code with no instrumentation.

Specification analysis clarifies the design of components and interfaces. Algorithm analysis points to the main features of performance, good or bad. Automated global instrumentation points to details omitted by the analyses, but it is every programmer's experience that testing and interactive debugging does not lead to perfect programs, consistent data, or properly applied methods. We are essentially proposing permanent installation of monitoring equipment. To paraphrase a classic dictum,

Algorithm design = data structures + methods + self-analysis,

with the intent to accumulate information during the release phase. The desired tool is one which retains debugging and performance analysis experience in the "final" production version of a program.

The current implementation is a set of C++ template classes to represent time, clocks and counters, and to build easily adopted tools from this basis. There is first a portable library of CPU and elapsed time clocks (which will provide input to the Parallel Tools Consortium Portable Timing Routines project as a side effect). System-specific hardware counters can be used when available. User-defined "clocks" for any other "time" are easily defined. The next step is a Timer class, to accumulate time as directed by the programmer and controlled in cooperation with an interactive user. A report subclass is used for consistent information and visibility control. There are several levels of historical data maintained with runtime control. Additional support is provided for user controls, periodic sampling and persistence of globals. The design is compatible with the C++ Standard Template Library for sets of Timers.

The following claims are the basis of this approach:

  1. We now have enough computer resources for long-term performance measurements and analyses
  2. The tools for long-term measurements can also be used for short-term, and temporary, measurements.
  3. Low-cost targeted measurements are a supplement to the standard tools that are independent of algorithm, data and program design, and which tend to be much higher-cost than is economically justified for "production" software.
  4. While elapsed real time is the final arbiter of performance, other measures should be easily incorporated, without changing the decisions about where to take measurements, or how to code them.

The tools employed during the program life cycle are roughly:

Design StageMeasurement Tools
algorithmpencil, Mathematica
models of data, user, computer?
program 
0. first trydebugger
1. developmentprof, gprof
2. testhardware monitors
3. shipnothing
4. redesignno tools, no information

What's missing when it's time to redesign? The accumulated experience of