MP_Lite:
Channel Bonding Ethernet between PCs
It is common to use Gigabit Ethernet to connect PCs in low cost clusters
due to its low price of around $100 per machine (~$40 for the network card and
~$60 per port for the switch). Gigabit Ethernet provides a theoretical
maximum of 1000 Mbps, reaching 900 Mbps in practice with some tuning
of the system.
This provides a low cost parallel computing system, but one that is
much more unbalanced than traditional MPP systems that use similar
processors but have communication systems that are
an order of magnitude faster. This limits the types of applications
that are suitable for PC clusters.
Faster networking such as Myrinet, Quadrics, SCI, or InfiniBand
can be used to connect the PCs, but this usually doubles the cost of the
cluster.
Channel bonding is a method where the data in each message gets striped across
multiple network cards installed in each machine. The figure above shows a
small PC cluster with 2 network cards per machine. The graph
below demonstrates that channel bonding 2 Gigabit Ethernet cards per PC
using MP_Lite doubles the communication rate while only adding about 10% to
the overall cost of the cluster. Adding a 3rd card provides
little additional benefit.
Channel bonding using the Linux kernel bonding.c module currently
does not work at Gigabit speeds, providing worse performance than using a
single Gigabit Ethernet card. Proper tuning of this module should allow
for the efficient use of more Gigabit Ethernet cards per machine in the future.
It should be possible to get much close to the 4 Gbps limit of the
64-bit 66 MHz PCI bus by channel bonding at this low level.
Directions for setup and use
First a few warnings: Try channel-bonding between 2 machines before you
build an entire cluster around this. You will get different results
depending on the network cards, and possibly the main memory bandwidth of
your machines. The performance curves above show a pretty uniform doubling
of the communication rate for messages above a reasonable size. Applications
using only small messages that are latency bound will see no benefit, since
small messages that fit within a single Ethernet packet of 1500 Bytes are
sent over one network card only.
You do not need to make any changes to your code or to the way you compile
MP_Lite for TCP (make tcp). You will
need to set up your system to use the multiple network cards by assigning separate
IP numbers and names to each interface. For example, use something like
node0.ge1, node0.ge2, node1.ge1, node1.ge2, etc. Since each interface has
its own IP number and name, all connections can be linked to a single switch as in
the diagram above or you can connect each set of network cards to a separate switch.
The command to start a run on these nodes would then be:
mprun -np 2 -nics 2 -h node0.ge1 node0.ge2 node1.ge1 node1.ge2 program
The setup is therefore very minimal, and the only change to the user is
the need to specify the multiple interface names at run-time.