What is a Cluster Computer?

The exact definition of a cluster computer will depend a little on who you ask. However, there are some general characteristics that most will agree upon.

Building a simple cluster

Let's walk through the process of building a simple 4-node cluster out of some existing workstations, assuming that they are nearly identical and located in the same room.

The first step is to tightly couple them together. The typical workstation is probably connected to the network through an Ethernet connection to a hub or switch in the room. The minimum configuration that I would recommend is to use Gigabit Ethernet. This is fairly cheap, at around $100/machine, and provides enough bandwidth to run many applications.

If the computers are not connected this way, you will need to install a Gigabit Ethernet card in each (~$40/machine), connect them to a central Gigabit Ethernet switch ($1600 for a 24-port switch), and configure the interfaces properly. This second interface will be configured as a local network, so use IP addresses like 10.0.0.0 through 10.0.0.3. This requires root access, and differs greatly between operating systems. It is common to edit the /etc/hosts file so that these second interfaces have simple names like node0 through node3.

Now that the hardware is in place, the system may need to be adapted. If there is no common home directory across the nodes, you will need to set one up. This again will require root access, and may differ between operating systems. Basically, create a directory such as /cluster on each machine. Choose one machine to be the master node, and export this subdirectory to the others using the /etc/exports file. On the other nodes, mount the /cluster subdirectory using the /etc/fstab file.

All machines must trust each other enough to allow users to rsh/ssh between them without requiring a password. To test this, simply try 'rsh node1' for example, and you should log into node1 without being prompted for a password. If not, use man rsh or man ssh to determine the next course of action. You may just need to create a .rhosts file, or the appropriate ssh keys. If your machines are not set up to trust each other, you will need to convince the system administrator to change this.

Installing a message-passing library

The last step is to install the message-passing software such as LAM/MPI or MPICH. LAM/MPI comes as an RPM on the RedHat Linux CD's, making it easier to install than MPICH. It also provides better performance in most cases for cluster computers. Install the LAM/MPI RPM if necessary, then start the lamd daemons on the cluster.


Links to more advanced topics


Ames Laboratory | Condensed Matter Physics | Disclaimer | ISU Physics