Cray DMAPP for Optimizing Certain MPI Communications
DMAPP is a library provided by Cray which replaces the following MPI operations:
- MPI_Allreduce - works best for very small messages, usually 1-2 values at a time (max 16 bytes)
- MPI_Iallreduce - a non-blocking version of Allreduce; requires some special programming
PET has demonstrated performance improvements with actual codes by switching to DMAPP for small Allreduce operations. If you aren't sure if these operations are used in your code, or if they are used enough to warrant such optimization, then a profiling tool may help.
1. Enabling DMAPP
You must do a couple of things to use DMAPP:
- Make sure the dmapp module is loaded (seems to be automatic on some systems). Doing a redundant module load dmapp doesn't hurt anything.
- Re-link (don't need to recompile necessarily) with dmapp library:
For statically linked applications only, do the following:
- Set environment variables (BASH):
export MPICH_USE_DMAPP_COLL=1 export MPICH_SHARED_MEM_COLL_OPT=1
To run the code, always repeat steps 1 and 3 in your batch scripts.
2. Tuning DMAPP
There are other DMAPP variables that you can tune and experiment with:
$MPICH_RMA_OVER_DMAPP, but the defaults seem sane (and produced
the best results with NavyFOAM). DMAPP also provides other features, like a
non-blocking all-to-all. Do this on a Cray to get more info:
- search for dmapp (
You can also experiment with other MPI values (discussed in man mpi), like
to tune Cray MPI's behavior.