SCL Cluster Cookbook
Assembling a Cluster

·   Server vs. Regular Nodes   ·   Hardware   ·   Software   ·
·   Cluster Cookbook Home   ·   Scalable Computing Lab   ·

Server versus Regular Nodes

Since the interconnect used by the cluster will likely not be directly connected to your institution's production networks, one of the nodes in the cluster will likely need to have an additional network card to connect to your institution's production network. Because this one node will have direct access to your institution's network (so new software can be directly downloaded into it), it is useful to consider this special node the server node and perhaps equip it with a bigger disk and/or more memory as this node will be used to compile programs and share parallelized applications with the other nodes in the cluster. As you assemble your cluster, be sure to put the appropriate hardware (the additional network card and perhaps extra memory and/or a bigger disk drive) in the server node.

Hardware

Add Interconnect Hardware and Test

If the network cards are not already installed in the computers, install them now. Some cards come with a diagnostic program that can be run under DOS to test the card's operation in the computer, so you may wish to use it to verify the operation of the card itself in the computer.

Place PCs on Shelving

For a cluster that contains a number of computers, it is convenient to stack them on sturdy shelves. The photo below shows one of the racks used at Ames Lab to hold 16 PCs.

[ 16 PCs on a Shelf ]

Figure 1. 16 PCs on a 4-Tier Shelf

Connect Cables

Route power cables and interconnect (network) cables between PCs, power sources, and network hubs or switches as appropriate. In our experience with a large (64-node) cluster, a separate electric circuit was necessary to power each group of 16 machines.

A monitor and keyboard can be shared between a group of systems using video/keyboard switch boxes. If you intend to run graphics displays through the switch boxes, beware cheap video cables and switch boxes which will only work for text screens. See Planning the System of Jan Lindheim's Beowulf tutorial for an example of how the keyboards and video displays can be connected using four-port switch boxes. A similar connection can be seen above in figure 1 (albeit somewhat obscured by the shelf's leg) where a group of five switch boxes are used to switch the keyboard and display between the four groups of four computers.

Test the Shelf of PCs

Test power, keyboard & video, and interconnect with all PCs in shelf operating. At a minimum, it should be verified that every PC powers up OK and has a connection to the network hub or switch. Usually the network hub or switch has LED lights that will show at a minimum whether the switch detects a live interface card. A more thorough test of the interconnect might be possible using the network interface card vendor's diagnostic programs, if any were supplied with the cards.

Software

Install and Configure the Operating System

The instructions in this section will assume that Linux or another UNIX variant is used as the operating system for the cluster.

Install Linux (or whichever operating system was previously selected) on each of the PCs in the cluster. If the operating system supports cloned installations, it may be easy to install the operating system first on the server node and then use the cloning facility to copy the installation to the client nodes.

When configuring IP for the network interfaces on the systems, on the server's network card connected to your institution's network use the IP address assigned to the system by the network administrators. For the other network cards, it is suggested that IP addresses in the 192.168.0.0 to 192.168.255.255 range be used for addresses for the cluster's interconnect (see RFC 1918 for details on IP address allocations for private networks). For example, if you are using a single Fast Ethernet as the interconnect, you might want to assign IP address 192.168.1.1 and subnet mask 255.255.255.0 to your server's interconnect and then number the interfaces on the clients starting with 192.168.1.2.

When IP addresses are assigned to all the interconnect interfaces, record the IP addresses and node names in the /etc/hosts file.

Make sure the necessary compilers (C, Fortran, etc.) are installed on the server.

Sharing Files From the Server
To share compiled programs and data files from the server to other nodes, it is suggested that a directory /mirror be made on the server. Then, use NFS (usually provided on all UNIX systems) to serve the /mirror directory to the other nodes by editing /etc/exports to contain something like the line:

/mirror node1(rw) node2(rw) node3(rw) ...

where node1, node2, and node3 (and so on) are replaced with the actual names of the client nodes. (This example is for Linux. The format of entries in /etc/exports varies by operating system, so consult the exports(5) manual page for the format necessary on your system.)

Additional configuration may be necessary to enable NFS serving from the server node. Consult your operating system's documentation if necessary to complete the configuration of NFS service on the server node.

When the system becomes operational, it may be convenient to make subdirectories for each user in the server's /mirror directory and change ownership of each subdirectory to the associated user so each user can use his or her shared mirror directory to share programs and data across the cluster.

Accessing Shared Files from Clients
On each of the the client nodes, make the directory /mirror. Add a line like this to /etc/fstab:

node0:/mirror /mirror nfs rw,bg,soft 0 0

where node0 is the name of the server. This addition will automatically mount /mirror from the server onto the /mirror directory on each client. (This example is for Linux. The format of entries in /etc/fstab varies by operating system, so consult the fstab(5) manual page for the format necessary on your system.)

Install Software

If the necessary compilers (C, Fortran, etc.) are not already installed on the server, install them now. The server will be used for compilation of programs, and the shared mirror directory will be used to access the programs and data from the clients.

On the server node, install MPICH and/or other libraries needed for parallelization. Documentation on MPICH is available at http://www.mcs.anl.gov/mpi/mpich/docs.html.

Authorize Users

Each user must be created on the server and the new user's UID assigned in the server's /etc/passwd file (which lists the authorized users of a system) must be used as that user's UID on each of the clients so that the files shared via NFS will be accessible on each client.

To avoid having to maintain multiple copies of the passwd file, rdist can be used to automatically copy the server's passwd file to the clients. See Judith Ashworth's rdist to the Rescue! article which describes using rdist to distribute the passwd file (as well as other files).

If your operating system has it, NIS (Network Information Service, also known as Yellow Pages) could be used to share the server's passwd file with the clients. NIS carries a bit of a security risk and may be non-trivial to configure and operate, so rdist may be a better, simpler solution.


Appearance of any vendor in this document does not constitute endorsement of that vendor by Ames Laboratory.
Questions or comments? Send mail to ghelmer@scl.ameslab.gov
Copyright © 1997, 1998 All Rights Reserved by Scalable Computing Laboratory.
Disclaimer Notice
Maintained by ghelmer / Last updated on 06/10/98