A driving force behind the development of NetPIPE has been protocol independence and the ability to accurately compare different protocols. The resulting bandwidth graphs for MPI , the message passing interface, and TCP are presented in Figure 8. All data were obtained using the same machines and all communication was over a dedicated ATM fiber pair. This graph demonstrates the effectiveness of NetPIPE to compare totally different protocols.
Often a programmer uses a communication package to avoid working with the details of setting up connections. While ease of use is clearly gained, naive use of these extra protocol layers adds communication overhead, thus reducing the network throughput. This protocol layer overhead is clearly evident in the signature graphs. The MPI library used was based on TCP, but clearly an application program pays for its ease of use by sacrificing latency and bandwidth. This sacrifice drops the aggregate bandwidth as well. The tradeoff of ease of use and throughput is currently being investigated for TCP and ATM's AAL5 application programmers interface. Nevertheless, the overhead associated with a protocol layer is now easy to visualize.
The design and use of NetPIPE has revealed interesting network anomalies and tendencies. In particular, NetPIPE demonstrated the significance of data block alignment to page boundaries. This data is shown in the signature graphs for ATM using aligned and unaligned data in Figure 9 Page aligned data blocks yield a maximum throughput that is only slightly in creased. However, note the large plunge in performance using unaligned data.
Figure 9: Page Aligned vs. Unaligned Transfer Block Throughput
NetPIPE has the option of specifying a starting and ending transfer block size and the increment value. This option allows for a closer examination of the dip in performance due to unaligned data. Figure 10 shows throughput plotted versus transfer block size. There are three distinct regions in the graph. On either side of the chasm, the block transfer is at normal speed. For block sizes of approximately 59 K bytes to 72 K bytes, the throughput is a dismal 5 Mbps. Also note the chaotic transition regions between the two performance levels. The single data point of high throughput inside the chasm is at a block size of 67.4 bytes. The reason for an increase in throughput for that single measurement is not known, and the cause of the performance drop has not been fully investigated at this time. However, the performance plunge does appear to be linked to the TCP socket buffer size. Changing the socket buffer size moves the dip to a different portion of the graph, and aligning the data to page boundaries effectively removes it. Other studies [4,5] have missed the performance chasm by not evaluating enough data points or always using page aligned data.
Figure 10: A Detailed Examination of the ATM Performance Dip
Another graph of interest is the comparison of FDDI block transfer on different architectures. Figure 11 shows the signature graphs for transfer between two identical DEC 3000 workstations in comparison to the SGI data previously shown. In both cases, the transfer blocks were aligned to page boundaries. There are three differences that are important to observe: 1) The DEC FDDI has a performance dip similar to the ATM data, 2) The latency for the DEC workstations is smaller, and 3) Regardless of the lower latency, the maximum throughput for the DEC machines is much less than that attained by the SGI workstations. Vendor defaults were used throughout the experiments. There may be some internal parameters that can be adjusted for the DEC machines to improve their overall performance.
Figure 11: FDDI Block Transfer Comparison of SGI and DEC