fountainTreeTest.cpp File Reference

Regression testing utility to test changes in the Fountain tree topology rebuilding and recovery algorithms

The basic idea here is to test the Fountain tree topology recover and rebuilding algorithms after changing the source code to make sure nothing is broken. This utility works both locally and remotely via ssh. It is difficult to debug problems in the code without access to the logfiles from each Fountain node. Therefore, it really does not make sense to run this utility without FOUNTAIN_DEBUG enabled. More...

#include "define.h"
#include "ClientRequest.h"
#include "NodeID.h"
#include "FountainWireProt.h"
#include "FountainErrors.h"
#include "Enforce.h"
#include "Asserter.h"
#include <bamboo/libbamboo.h>
#include <bamboo/qLog.h>
#include <bamboo/SSSXML.h>
#include <bamboo/ConfigReader.h>
#include <csignal>
#include <cstddef>
#include <memory>
#include <unistd.h>
#include <cerrno>
#include <iostream>
#include <cstdio>
#include <fcntl.h>
#include <vector>
#include <string>
#include <iterator>
#include <algorithm>
#include <sys/wait.h>

Functions

bool initialize ()
 Performs initialization work for the master and slave Fountain node daemons.
void interruptHandler (int signo)
 Called when we catch a signal.
void respawnNodes (vector< NodeID > &bombList)
int main (int argc, char **argv)


Detailed Description

Regression testing utility to test changes in the Fountain tree topology rebuilding and recovery algorithms

The basic idea here is to test the Fountain tree topology recover and rebuilding algorithms after changing the source code to make sure nothing is broken. This utility works both locally and remotely via ssh. It is difficult to debug problems in the code without access to the logfiles from each Fountain node. Therefore, it really does not make sense to run this utility without FOUNTAIN_DEBUG enabled.

The tree recovery algorithm is defined as recovering from a single node failure. This happens for a variety of reasons, perhaps a node kernel panics, or segfaults, or someone trips over the power cord in the server room. To recover from a node failure, the parent node and all child nodes of the lost node have to contact the master Fountain node and report the failure. After they all have contacted the master Fountain node within a predetermined time limit, the master Fountain node attempts to contact a replacement node in the tree topology and ask it to replace the lost node. If at any point during the recovery algorithm and unrecoverable failure occurs, we proceed to the tree rebuilding algorithm.

The tree rebuilding algorithm is quite simple. When the master Fountain node cannot recover from a node failure, it transitions into the rebuild algorithm. The rebuild algorithm erases the entire tree topology and starts it over with just a single node (the master Fountain node). There is a predetermined time limit for which the tree topology can exist in the rebuild state. While in the rebuild state, join requests are accepted and lostParent or lostChild requests are checked to ensure a node did not die during rebuilding. If a node does die that is in the tree topology during rebuilding, the rebuild algorithm starts over. For all nodes reporting a lostParent request during the rebuild algorithm, they are told to ignore the lost parent node and rejoin the tree topology.

Currently, there are two steps to this regression test. The first step involves iterating through the entire tree topology and killing each node except the master node. After killing each node, the tree is checked to ensure only the requested node exited and the tree did not rebuild. Then, the node that replaced the killed node is killed itself so the tree topology will remain in the same logical structure that it started in this iteration.

The second step of this regression test also iterates through the entire tree topology. During each iteration, it will kill 2 through n consecutive nodes in the tree, where n is the size of the tree topology excluding the master Fountain node. After each sub-iteration the tree is checked to ensure only the nodes requested to exit actually did so. The nodes that exited are then asked re-spawned to rejoin the tree topology.


Function Documentation

bool initialize  ) 
 

Performs initialization work for the master and slave Fountain node daemons.

This function does two important things, registering for signal events and initializing the Bamboo library.

Return values:
bool true if initialization was successfull
false otherwise

void interruptHandler int  signalNumber  ) 
 

Called when we catch a signal.

If the signal is SIGINT or SIGTERM we'll set the global done variable to true

Parameters:
[in] signalNumber The signal number caught

int main int  argc,
char **  argv
 

Todo:
make regression step 3 kill each nodes parent and children (if applicable)

void respawnNodes vector< NodeID > &  bombList  ) 
 

Todo:
make sure the nodes started correctly by checking their exit status sjm 10-10-2005
Todo:
get this ssh command from the config reaader. sjm 2-2-2006


Generated on Wed Mar 8 14:43:31 2006 for Fountain by  doxygen 1.4.6