SCL Cluster Cookbook|
Throughput over a range of message sizes is an important issue in
interconnection technologies. The graph below compares technologies
using the Scalable Computing Lab's NetPIPE
analysis tool which tests the performance of point-to-point
communication. Performance on this graph is shown as throughput where
the block size is increased exponentially starting with one byte.
Figure 1. Comparison of Ethernet, FDDI, ATM, Fast Ethernet, Gigabit Ethernet, and Myrinet
Another important issue in cluster communication is latency. High throughput tends not to be useful for parallel computing if it is not accompanied by low latency (i.e., small messages must be transferred quickly). Graphing the data obtained from NetPIPE shows the latency clearly.
Figure 2. Latency of Ethernet, FDDI, ATM, Fast Ethernet, Gigabit Ethernet, and Myrinet
For the clusters built at Ames Lab, we have used SMC EtherPower Fast Ethernet cards and a variety of Fast Ethernet hubs and switches from vendors including Bay and Cabletron. It seems to be important to test a switch with the desired NICs before purchase, as we have discovered that one particular switch did not perform well in our Linux clusters. The switch in question did not autosense full-duplex link operation and we were not able to force the NICs into full-duplex mode using the Linux driver for the Fast Ethernet cards. The switch vendor was less than helpful and just said "force your cards into full-duplex mode". The impact of this problem was extraordinary and thanks to NetPIPE's analysis, we will not use the problem switch in any clusters.
See the Running NetPIPE page for information on using the NetPIPE analysis tool to do your own analysis.
Operating System Effect on Interconnect Performance
An interesting result of Ames Lab's NetPIPE tests on networks shows
the impact of an operating system's network protocol stack on
high-performance interconnects. For example, the following graph
shows the performance of Packet Engines Gigabit Ethernet hardware on
two Pentium Pro 200MHz PC's connected back-to-back running Linux,
FreeBSD, and NT.
Figure 3. Performance of Gigabit Ethernet on Linux, FreeBSD, and NT
A TCP delayed ack fix for Linux 2.0.x kernels has been developed by Wayne Salamon. We are looking at the fix and will incorporate the results of the tests in this page if it improves TCP performance.
The NT line in the graph above reflects Packet Engines Gigabit Ethernet G-NIC hardware (manufactured 9/97), G-NIC Windows Drivers version 126.96.36.199, and this NT registry tweak:
HKEY_LOCAL_MACHINES\SYSTEMS\CURRENT_CONTROL_SET\SERVICES\TCP/IP\PARAMETERS\TcpWindowSize = 0xffff