The native SHMEM library on the Cray T3E delivers an awesome 340 MB/sec (2720 Mbps) transmission rate at a low low latency of only 2-3 µs. However, the SHMEM library is only available on the T3E and SGI systems. MP_Lite delivers 94% of this bandwidth to applications in a portable manner. This was a vast improvement upon the original Cray MPI library where 50% of the SHMEM performance was lost. However, the current version of Cray MPI delivers 85-90% of the raw SHMEM performance with only a 5 us latency. MP_Lite therefore only provides a small benefit over the current Cray MPI library, and is probably not worth worrying about.
The Cray MPI performance is much worse for message sizes not divisible by 8 Bytes. While this will seldom be the case since both integers and doubles are 8 Bytes, it is something to be aware of. The top of the spikes in the graph below represent message sizes divisible by 8 bytes (normal message traffic).
The bottom line is that you should make sure the current Cray MPI library is installed, pass messages only of sizes divisible by 8 bytes, then forget about MP_Lite.