The Spang Robinson Report, 1990
Gustafson's latest target is benchmarking, where the conventional wisdom has been that the measure of a computer's capacity lies in how many operations per second it can perform while doing a fixed task. Wrong, he argues; the best measure is how much work the computer system can do in unit time. To prove his point, he and colleagues at Ames Laboratory, a Department of Energy facility managed by Iowa State University, have devised SLALOM, an acronym for Scalable, Language-independent, Ames Laboratory One-Minute Measurement.
The acronym is a little tortured, but the concept is straightforward. Given a repetitive problem, see how many repetitions the computer can complete in one minute. Since it is known how many floating point operations must be executed to complete a given number of repetitions, the system power in MFLOPS can be derived.
Scalability is an important factor, Gustafson says, because systems of the future will be so much more powerful than current systems that existing standard benchmarks such as Linpack and the Perfect Club will not be relevant. In fact, he says, even the 1000 x 100 Linpack problem is too small to be a good measure to today's most powerful systems. However, with SLALOM, he says, it will be possible to compare accurately a PC and a teraflop supercomputer. Scalability with SLALOM begins with any system that can execute at least 148 floating point operations per second, the minimum requirement for one iteration of the problem. That requirement is well within the capability of most of today's personal computers.
The SLALOM benchmark problem, derived from a scientific paper published in 1984, deals with radiosity, a rendering technique, in which the task is to find the equilibrium radiation inside a box made of diffuse colored surfaces. The faces of the surfaces are divided into regions called patches, and the equations describing the coupling of the patches are solved for red, green and blue spectral components. The job includes input-output and setup costs, and must be correctly done to eight decimal digits (The original benchmark design called for accuracy to between nine and ten digits, but the requirement was relaxed, Gustafson said, to accommodate some systems for which the extra accuracy would have been an overly-severe burden).
The basic unit of measurement in the benchmark is the "patch," and the figure of merit is the number of patches that can be calculated in one minute. On the list of SLALOM benchmarks completed to date, this figure of merit ranges from 24 patches, using a Macintosh IIcx with interpreted Quick Basic, to 5120 patches for an eight processor CRAY Y-MP. Because the number of floating point operations increases by the cube of the number of patches solved, the range of megaflops is even wider, from 0.00239 for the Mac IIcx to 2,130 for the CRAY Y-MP system. A curious result was reported for a CRAY 2S eight processor system: 2560 patches yielding a derived 293 MFLOPS, or only one-seventh the performance of the full Y-MP.
Some other results of note from the as-yet rather short list of reports include 1438 patches for a 64-processor nCUBE 2 (47.2 MFLOPS), 1407 patches for an 8,192 processor MasPar MP-1 (50.8 MFLOPS) and 853 patches (19.2 MFLOPS) for a Silicon Graphics 4D/380S with four processors.
The language independence of the test derives from the fact that the test is to solve the problem to a given standard of precision, not to run a certain piece of code. Gustafson contends that this is a true test of a computer system (hardware and software), and that it allows for increases in performance due to algorithmic improvements, which have historically been the source of about half of all performance increases. The rules of the SLALOM benchmark game allow use of any language, and of variant versions for different computer architectures.
The Ames Laboratory will be the custodian of the SLALOM benchmark, much as Argonne Labs and now the University of Tennessee are the custodians of the Linpack set, and the Center for Supercomputing Research is the home of the Perfect Club. All results received, that can be verified for accuracy and precision, will be reported, although Gustafson says that he will note particular results submitted by hardware vendors, on the premise that vendors frequently have compiler versions, special tools and other facilities not yet available to users. As evidence of the seriousness of the intent, Gustafson said, the team has applied for a patent for SLALOM.
Ames can make available versions of the benchmark in C, FORTRAN 77 and Pascal, and intends to expand the range of languages. They will also supply variations with compiler directives for shared-memory parallel machines, message-passing versions for distributed memory systems and a MasPar version for SIMD systems. Users may optimize code without restriction, except that they may not specialize code to the input data. The code is available via "anonymous ftp" to tantalus.al.iastate.edu. Any and all results should be reported to slalom@tantalus.al.iastate.edu.
Gustafson said that the goal of the SLALOM effort is to maintain complete scientific integrity of results. His model for the effort, he said, is Consumer Reports. Thus he and his team will not accept consulting engagements related to SLALOM, although they will offer free services to anyone to help tune code for the benchmark. For the future, in addition to returning results, he said, it is planned that it will also be required that the person reporting must submit the code that was actually executed.
A preliminary report on SLALOM appeared in Supercomputing Review for November, 1990. A more detailed description of the process will be published in a forthcoming issue of the Journal of Parallel and Distributed Computing.
Contact:
John Gustafson
gus@scl.ameslab.gov
The URL for this document is http://www.scl.ameslab.gov
Revised
Pages prepared by Maria E. Blanco.