SCL Cluster Cookbook|
|Back of a cluster, showing the white Fast Ethernet interconnection wires|
A number of very high performance network technologies are available in the marketplace. We have evaluated several of these high-performance technologies, including ATM, FDDI, Gigabit Ethernet, and Myrinet for use as interconnects for clusters.
In general, our evaluations of high-speed interconnect technologies shows that it is difficult to achieve TCP communications speeds over 150Mbps with current Pentium Pro systems. This limit seems to be due to memory bandwidth limitations and protocol stack implementations which tend to copy data several times during transmission and reception. With the same network hardware on Alpha systems that have much higher memory bandwidth, we have seen much higher performance which indicates that as memory bandwidth on PC systems improves, the interconnect performance barrier should rise as well.
ATM is a switched virtual-circuit
technology. As currently implemented for connecting computer systems,
ATM runs at 155Mbps. An ATM network
requires the ATM network interface cards (NICs) at each computer, fiber optic or
Category 5 unshielded twisted pair (UTP) copper cables (depending on the
selected equipment), and at least one ATM switch to interconnect the
systems. ATM interconnects are fairly expensive and tend to have high
latency compared to other LANs.
FDDI is a token ring fiber optic network that runs at 100Mbps. FDDI is praised for deterministic operation under heavy load (as compared to Ethernet) but the low cost of Ethernet switches mitigates this advantage. Also, the latency of FDDI should be higher than Fast Ethernet or Gigabit Ethernet technologies because even on a lightly loaded network a station must always wait for the token before it can transmit its data. Like ATM, FDDI is also fairly expensive and is generally not used for clusters.
Ethernet is a time-tested network technology that has become very cheap but at 10Mbps does not offer much throughput. The most popular implementation of Ethernet is 10BASE-T, which interconnects PCs with Ethernet NICs to an Ethernet repeater or switch using Category 3 or better four-pair UTP copper cable. Ethernet would work for a very cheap cluster, but would be a serious bottleneck for a cluster of modern PCs.
Fast Ethernet is an upgrade of Ethernet that provides 100Mbps transmission speeds. The most popular implementation is 100BASE-TX, which interconnects PCs with Fast Ethernet NICs to a Fast Ethernet repeater or switch using Category 5 UTP copper cable. Fast Ethernet network interface cards have become a true commodity item, with prices for high-quality boards recently reaching as low as $60 per board. 16-port Fast Ethernet repeater hubs such as the one shown below have been seen in catalogs for about $750. At these prices, it makes sense to use a full-duplex Fast Ethernet switch, such as the $2500 24-port Bay Networks B350T, instead of a plain Fast Ethernet repeater to connect systems in a cluster. Using the full-duplex switch avoids collisions, which can degrade performance under heavy loads. A Fast Ethernet repeater that is used for one of our clusters is shown below.
A 100Base-TX Fast Ethernet Repeater for a 16-Node Cluster
Gigabit Ethernet is an up-and-coming technology that has recently been standardized. Gigabit Ethernet layers the Ethernet media access control protocol (CSMA/CD) over the established ANSI-standard Fibre Channel physical technology with a minor adjustment to increase real data throughput to 1Gbps.
Gigabit Ethernet interconnects NICs in systems with a Gigabit Ethernet repeater or switch using multimode fiber optic cable. Packet Engines, a Gigabit Ethernet vendor, offers a hub technology called a "full-duplex repeater" which, like a full-duplex switch, avoids collisions but, like a repeater hub, floods packets to all ports.
Gigabit Ethernet still tends to be expensive but prices continue to fall. Interface cards commonly cost around $1000. A Gigabit Ethernet full duplex repeater, the Packet Engines FDR12, lists for $17900. Adding in the cost of about $70 per fiber optic zip cord to connect the NICs to the repeater, each connection's average cost is around $2000. As shown in the Interconnect Performance section, Gigabit Ethernet is a strong performer at a peak of 150Mbps in our Pentium Pros (performance ought to be much better in 400MHz Pentium II systems) and 250Mbps in our Digital Alpha systems.
Myrinet is a proprietary, high performance interconnect used in many of the more expensive clusters. Myrinet is a 1.2Gbps technology that uses short-distance (10 feet) copper cables to connect NICs to switches. Myrinet provides low-level messaging via proprietary protocols that improves latency over TCP by reducing the overhead. The average cost of a Myrinet interconnect runs about $1700 per node, based on $1400 per interface card and $2400 for an 8-port Myrinet switch.
Currently, Myrinet and Gigabit Ethernet technologies may require one to purchase both the PC NICs and interconnects (hubs or switches) from a single vendor. Gigabit Ethernet should be standardized so equipment from different vendors may be mixed, but you may want to try equipment and verify interoperability before making a major purchase. Interoperability of ATM, FDDI, and Fast Ethernet equipment from different vendors is assumed, so different vendors may be used for the NICs and hubs/switches for these technologies.
See the Interconnect Performance page for an examination of the performance of the various commodity interconnection technologies using the NetPIPE analysis tool.