Mixed Programming Models and Cache Aware Algorithms
There has been some work on applications that couple SMP based parallelization techniques
with message passing; however, the performance and ease of programming is still a research area.
Meng-Shiou Wu (graduate student in computer engineering) has implemented a threads-based matrix
multiply algorithm using cache aware algorithms. He has built both shared memory algorithms and
Message Passing Interface (MPI) based super-structure that will use the cache aware algorithms as
the node specific algorithm. The super-structure is responsible for moving data among nodes.
This is loosely based on CannonŐs algorithm for distributed matrix multiplication. Initial
performance metrics show the Pthreads implementation to be robust and fast on various 2 processor
SMP systems. Further work has shown that the superstructure is less important than the underlying
single thread algorithm. A cache aware algorithm as the base unit of computation is the key to
performance. Regardless of the superstructure algorithm, scalable high performance is achieved
with the cache aware algorithm at the root of the computational tree. The key element left is
the load-imbalance of the superstructure. Ricky Kendall is the outreach coordinator and webmaster
for the project.