Fountain

0.0.4

Author:
Sam Miller - samm@scl.ameslab.gov

Brett Bode - brett@scl.ameslab.gov

Date:
March 2, 2006

intro_sec

Fountain is a node monitor component for the SciDAC Scalable Systems Softare project. It is capable of aggregating the node status information of every node in a cluster in a scalable, reliable, and efficient method all while using a negligible amount of cpu activity on each node. It can recover from individual and multiple node failures in the event a node unexpectedly goes down or is taken offline for administrative purposes.

install

Note: This assumes you obtained a Fountain tarball, to compile from CVS you will need autoconf & friends to generate a configure script. Also Note: Fountain uses the SmartPtr class from the Loki project that does some very creative things with C++ templates. While this policy based design adds a wealth of features it requires Fountain to be compiled with recent compiler versions. Fountain has only been tested on Linux and Mac OS X using g++ 3.4 and later. Other compilers or older versions of g++ may not work. When compiling the Loki headers, there's a bug in g++ 4.0 included with the Apple Developer Tools for MacOS 10.4 preventing it from understanding uintptr_t in SmartPtr.h. You will need to grab the Dev Tools 2.2.1 update which includes g++ 4.0.1 and seems to fix this.

  1. ./configure
  2. make
  3. make install

usage

Fountain consists of three separate classes of components.

There are two different Fountain node daemons, the master Fountain daemon and slave Fountain daemons. The master Fountain daemon is assumed to run on the head node of a cluster, it is responsible for maintaining the n-ary tree structure of all the slave Fountain daemons. The slave Fountain daemons run on all the other nodes of the cluster and connect to each other in a n-ary tree topology as directed by the master daemon. A shared file system is not required for the Fountain daemons to operate, but each slave daemon needs to know the hostname and listener port of the master daemon. This can either be defined in a configuration file, or using a command line argument.

The Fountain server is responsible for maintaining node status information. Currently, the Fountain server can obtain node information from Fountain daemons. In future releases, support will be added to obtain node status information via other mechanisms such as Ganglia, Supermon, and other parallel supercomputers such as the IBM BlueGene and Cray XT3. The Fountain server can also monitor server specific data sources such as network information. To enable Infiniband network support, configure Fountain using the --with-infiniband argument.

The Fountain client utilities all interact directly with the Fountain server, they have no concept of the Fountain node daemons. These utilities can be run locally on the same node that is running the Fountain server, or remotely. They will need to know the hostname and port of the Fountain server, as well as the correct wire protocol to use when talking to the Fountain server. All of these options can be set in the config file, which is typically the same one used for the Fountain server. The three client utilities are:


Generated on Wed Mar 8 14:43:30 2006 for Fountain by  doxygen 1.4.6