Cray DMAPP for Optimizing Certain MPI Communications
DMAPP is a library provided by Cray which replaces the following MPI operations:
- MPI_Allreduce - works best for very small messages, usually 1-2 values at a time (max 16 bytes)
- MPI_Iallreduce - a non-blocking version of Allreduce; requires some special programming
- MPI_Alltoall
- MPI_Barrier
PET has demonstrated performance improvements with actual codes by switching to DMAPP for small Allreduce operations. If you aren't sure if these operations are used in your code, or if they are used enough to warrant such optimization, then a profiling tool may help.
1. Enabling DMAPP
You must do a couple of things to use DMAPP:
- Make sure the dmapp module is loaded (seems to be automatic on some systems). Doing a redundant module load dmapp doesn't hurt anything.
- Re-link (don't need to recompile necessarily) with dmapp library:
-ldmapp
For statically linked applications only, do the following:
-Wl,--whole-archive,-ldmapp,--no-whole-archive
- Set environment variables (BASH):
export MPICH_USE_DMAPP_COLL=1 export MPICH_SHARED_MEM_COLL_OPT=1
To run the code, always repeat steps 1 and 3 in your batch scripts.
2. Tuning DMAPP
There are other DMAPP variables that you can tune and experiment with: $MPICH_DMAPP_COLL_RADIX, $MPICH_DMAPP_HW_CE, and $MPICH_RMA_OVER_DMAPP, but the defaults seem sane (and produced the best results with NavyFOAM). DMAPP also provides other features, like a non-blocking all-to-all. For more information about dmapp on any Cray system, see man mpi and search for the string "dmapp" (i.e., /dmapp)
You can also experiment with other MPI values (discussed in man mpi), like these: $MPICH_NO_BUFFER_ALIAS_CHECK, $MPICH_OPTIMIZED_MEMCPY, $MPICH_GNI_NUM_BUFS, and $MPICH_GNI_MAX_EAGER_MSG_SIZE to tune Cray MPI's behavior.