Overview

From The Scalable Computing Laboratory Wiki
Jump to: navigation, search

SCL Overview Background

The Scalable Computing Laboratory (SCL) was created in 1989 as a joint effort of the Department of Energy (DOE) through Ames Laboratory and Iowa State University (ISU) through the Center for Physical and Computational Mathematics (CPCM), one of several centers administered by ISU's Institute for Physical Research and Technology (IPRT).

Primary funding is from the Mathematical, Information, and Computational Sciences (MICS) Division of the DOE Office of Advanced Scientific Computing Research (ASCR). The SCL was a major participant in the Presidential Initiative for High Performance Computing and Communication (HPCC), and is currently involved in two SciDAC projects. The research at the SCL is driven by the application side, so strong collaborations have been maintained with the Chemistry and Physics groups which are funded largely by the Basic Energy Sciences (BES) division of DOE. Interaction with computer vendors has always played a major role in the development of new computing systems, and often results in leveraging of hardware resources such as with our IBM clusters. Research

The mission of the SCL is to advance the use of scalable computing in scientific and engineering computation within the Laboratory and the University. Much of the research is driven by the needs of key applications in the Chemistry and Condensed Matter Physics groups that have been active participants in the high-performance computing efforts. For example, the GAMESS quantum chemistry code has played a central role in acquiring the IBM clusters and in a SciDAC project on Advancing Multi-Reference Methods in Electronic Structure Theory. The Array Compression Library is being developed to allow codes such as these to trade CPU cycles to reduce communication bandwidth and storage requirements.

Research into performance analysis has focused on several key areas needed to understand the limiting factors that prevent applications from efficiently taking advantage of the resources available. The HINT benchmark was developed to analyze the capabilities of the processor and memory subsystem, providing a graph of the performance for a range of problem characteristics. The NetPIPE utility is a flexible tool for measuring the point-to-point network performance for different communication protocols. This is ideal for identifying inefficiencies in the message-passing layers, and identifying problems in the network hardware and drivers. NetPIPE is also being expanded to measure the global network properties to help understand the effect of the network topology on applications. A cache-aware matrix benchmark is being developed to study the use of mixed programming models on SMP systems. The SCL therefore has a set of performance analysis tools that probe the individual components of a high-performance computing system, and also a benchmark that exercises all components at the same time.

Performance is often lost due to inefficiencies in the message-passing layer. The goal of the MP_Lite project is to investigate methods to improve message-passing performance and to enable efficient message-passing across new hardware. The Generalized Portable SHMEM GPSHMEM project provides greater efficiency and brings the one-sided SHMEM interface to a wide variety of multi-processor systems. The NodeMap utility is being developed to determine the topology of the underlying network at run-time, which can then be used to automatically provide the best mapping of an application to the network.

Research improving the management techniques of multi-processor systems is not only making them easier to use by providing a single-system image, but also allows the resources to be used more efficiently. The SCL is part of the Scalable System Software SciDAC project, where we focus our efforts on parallel resource management using the Maui Scheduler for the PBS batch queuing system. A small cluster has also been set up to test the MOSIX cluster management system.

We have a very wide variety of computer resources from small test-beds for measuring CPU and point-to-point communication performance, to large clusters for evaluating network topologies and the performance of full applications. The clusters include Pentium, Athlon, IBM Power3II, G4 PPC, and Alpha processors, running Linux, AIX, and Tru64 Unix, connected by Fast Ethernet, Gigabit Ethernet, Myrinet, SCI, and InfiniBand. The hardware research is done in close collaboration with many vendor partners such as IBM , Myrinet , and Mellanox . Examples of some current research projects include porting 64-bit Linux to the IBM Power3II architecture, evaluating and improving drivers for various network interface cards, working with vendors and MPI developers to bring InfiniBand technology to the cluster community, and analyzing the performance capabilities of a 2D SCI network. Outreach

The experience gained from working with a wide variety of hardware and operating systems, and the measurements made with the performance analysis tools being developed, is being used to help groups within Ames Laboratory to purchase and use parallel computing systems to their fullest capabilities. Through the Center for Physical and Computational Mathematics (CPCM), this experience is being disseminated throughout the University. Many of the principal investigators also hold adjunct faculty positions in various departments, and are teaching courses on high-performance computing.

Personal tools