SCL Cluster Cookbook|
Assembling a Cluster
Server versus Regular Nodes
Since the interconnect used by the cluster will likely not be directly
connected to your institution's production networks, one of the nodes
in the cluster will likely need to have an additional network card to
connect to your institution's production network. Because this one
node will have direct access to your institution's network (so new
software can be directly downloaded into it), it is useful to consider
this special node the server node and perhaps equip it with a bigger
disk and/or more memory as this node will be used to compile programs
and share parallelized applications with the other nodes in the
cluster. As you assemble your cluster, be sure to put the appropriate
hardware (the additional network card and perhaps extra memory and/or
a bigger disk drive) in the server node.
Figure 1. 16 PCs on a 4-Tier Shelf
A monitor and keyboard can be shared between a group of systems using video/keyboard switch boxes. If you intend to run graphics displays through the switch boxes, beware cheap video cables and switch boxes which will only work for text screens. See Planning the System of Jan Lindheim's Beowulf tutorial for an example of how the keyboards and video displays can be connected using four-port switch boxes. A similar connection can be seen above in figure 1 (albeit somewhat obscured by the shelf's leg) where a group of five switch boxes are used to switch the keyboard and display between the four groups of four computers.
Install Linux (or whichever operating system was previously selected) on each of the PCs in the cluster. If the operating system supports cloned installations, it may be easy to install the operating system first on the server node and then use the cloning facility to copy the installation to the client nodes.
When configuring IP for the network interfaces on the systems, on the server's network card connected to your institution's network use the IP address assigned to the system by the network administrators. For the other network cards, it is suggested that IP addresses in the 192.168.0.0 to 192.168.255.255 range be used for addresses for the cluster's interconnect (see RFC 1918 for details on IP address allocations for private networks). For example, if you are using a single Fast Ethernet as the interconnect, you might want to assign IP address 192.168.1.1 and subnet mask 255.255.255.0 to your server's interconnect and then number the interfaces on the clients starting with 192.168.1.2.
When IP addresses are assigned to all the interconnect interfaces, record the IP addresses and node names in the /etc/hosts file.
Make sure the necessary compilers (C, Fortran, etc.) are installed on the server.
Sharing Files From the Server
To share compiled programs and data files from the server to other nodes, it is suggested that a directory /mirror be made on the server. Then, use NFS (usually provided on all UNIX systems) to serve the /mirror directory to the other nodes by editing /etc/exports to contain something like the line:
/mirror node1(rw) node2(rw) node3(rw) ...
where node1, node2, and node3 (and so on) are replaced with the actual names of the client nodes. (This example is for Linux. The format of entries in /etc/exports varies by operating system, so consult the exports(5) manual page for the format necessary on your system.)
Additional configuration may be necessary to enable NFS serving from the server node. Consult your operating system's documentation if necessary to complete the configuration of NFS service on the server node.
When the system becomes operational, it may be convenient to make subdirectories for each user in the server's /mirror directory and change ownership of each subdirectory to the associated user so each user can use his or her shared mirror directory to share programs and data across the cluster.
Accessing Shared Files from Clients
On each of the the client nodes, make the directory /mirror. Add a line like this to /etc/fstab:
node0:/mirror /mirror nfs rw,bg,soft 0 0
where node0 is the name of the server. This addition will automatically mount /mirror from the server onto the /mirror directory on each client. (This example is for Linux. The format of entries in /etc/fstab varies by operating system, so consult the fstab(5) manual page for the format necessary on your system.)
On the server node, install MPICH and/or other libraries needed for parallelization. Documentation on MPICH is available at http://www.mcs.anl.gov/mpi/mpich/docs.html.
To avoid having to maintain multiple copies of the passwd file, rdist can be used to automatically copy the server's passwd file to the clients. See Judith Ashworth's rdist to the Rescue! article which describes using rdist to distribute the passwd file (as well as other files).
If your operating system has it, NIS (Network Information Service, also known as Yellow Pages) could be used to share the server's passwd file with the clients. NIS carries a bit of a security risk and may be non-trivial to configure and operate, so rdist may be a better, simpler solution.