Building a simple cluster
Let's walk through the process of building a simple 4-node cluster
out of some existing workstations, assuming that they are nearly identical
and located in the same room.
The first step is to tightly couple them
together. The typical workstation is probably connected to the network
through an Ethernet connection to a hub or switch in the room.
The minimum configuration that I would recommend is to use Gigabit Ethernet.
This is fairly cheap, at around $100/machine, and provides enough
bandwidth to run many applications.
If the computers are not connected this way, you will need to install
a Gigabit Ethernet card in each (~$40/machine), connect them to a
central Gigabit Ethernet switch ($1600 for a 24-port switch), and configure
the interfaces properly. This second interface will be configured as
a local network, so use IP addresses like 10.0.0.0 through 10.0.0.3.
This requires root access, and differs greatly between operating systems.
It is common to edit the /etc/hosts file so that these second interfaces
have simple names like node0 through node3.
Now that the hardware is in place, the system may need to be adapted.
If there is no common home directory across the nodes, you will need to
set one up. This again will require root access, and may differ between
operating systems. Basically, create a directory such as /cluster on
each machine. Choose one machine to be the master node, and export this
subdirectory to the others using the /etc/exports file. On the other nodes,
mount the /cluster subdirectory using the /etc/fstab file.
All machines must trust each other enough to allow users to rsh/ssh
between them without requiring a password. To test this, simply try
'rsh node1' for example, and you should log into node1 without being
prompted for a password. If not, use man rsh or man ssh
to determine the next course of action. You may just need to create a
.rhosts file, or the appropriate ssh keys. If your machines are
not set up to trust each other, you will need to convince the system
administrator to change this.
Installing a message-passing library
The last step is to install the message-passing software such
as LAM/MPI or
MPICH.
LAM/MPI comes as an RPM on the RedHat Linux CD's, making it easier
to install than MPICH. It also provides better performance in most
cases for cluster computers. Install the LAM/MPI RPM if necessary,
then start the lamd daemons on the cluster.
Links to more advanced topics
Ames Laboratory |
Condensed Matter Physics |
Disclaimer |
ISU Physics