MP_Lite has some built in trace facilities that can dump the start
and finish times of each communication to a .trace.X file for each
node X. Below is an example of a trace dump from node 0,
which is communicating with node 1.
3.363730 - 3.363903 nd 0 --> nd 1 4 bytes / 173 usec
3.363913 - 3.363984 nd 0 <-- nd 1 4 bytes / 71 usec
3.363994 - 3.367812 nd 0 --> nd 1 100000 bytes / 3818 usec
3.367823 - 3.368922 nd 0 <-- nd 1 100000 bytes / 1098 usec
3.368934 - 3.370115 nd 0 --> nd 1 22304 bytes / 1181 usec
3.370127 - 4.747949 nd 0 <-- nd 1 22304 bytes / 1377822 usec
The right arrow --> indicates a send from node 0 to node 1, while the
left arrow <-- indicates a receive. The start and end times of the
communication are shown, as well as the total time and the number of
bytes transferred. In the case above, the 4 byte exchanges are to manually
do some handshaking to guarantee a preposted receive before the larger
exchanges of data. You can see that one receive took 1.378 seconds, which
resulted from a packet being dropped that had to be retransmitted
after the time-out period. This trace file helped identify that packets were
being dropped by TCP in AIX which eventually lead to a fix.
To use the trace facilities, simply edit the makefile to change the
all: tcp line if needed, then type make trace and link the
library into your code. Run the code as you normally would, and at
completion there should be .trace.X files for each node X. The times
may not match up exactly between each node, but the clocks are started
at roughly the same time in the MP_Init() function.