Bachelor's thesis presentation. Maximilian is advised by Santiago Narvaez Rivas.
Previous talks at the SCCS Colloquium
Maximilian Dornheim: Implementing rank relabeling in the reshuffle library
SCCS Colloquium |
High-performance computing (HPC) increasingly faces a widening gap between raw floating-point throughput and the effective cost of data movement. As large-scale applications are executed on distributed-memory systems, communication rather than computation frequently becomes the dominant bottleneck. One promising mitigation strategy is rank relabeling (also referred to as rank reordering), i.e. the systematic reassignment of MPI rank identifiers such that communication-heavy pairs of processes are placed “closer” in the logical communicator. By increasing the share of data that can be handled locally (self-communication) and reducing expensive inter-rank traffic, an application can improve overall throughput without changing its numerical kernels.
This thesis investigates the design, implementation, and evaluation of rank relabeling in the reshuffle library, an MPI-based data redistribution framework inspired by COSTA. First, we formulate the relabeling problem as an assignment problem on a communication weight matrix that quantifies how much data each pair of ranks is expected to exchange. We then present two solution strategies: (i) an optimal baseline using the Hungarian algorithm, and (ii) a lightweight deterministic greedy heuristic.The Hungarian method guarantees a globally optimal assignment but exhibits cubic time complexity, whereas the greedy strategy runs in effectively quadratic time and produces near-optimal permutations at significantly lower cost Second, we integrate these strategies into reshuffle by constructing an optimized MPI communicator whose rank numbering reflects the chosen permutation. This integration is exposed through a high-level interface that allows applications to obtain a reordered communicator transparently.
Finally, we evaluate both the cost of computing the relabeling and its impact on overall performance. On communication matrices of increasing size, the greedy heuristic outperforms the Hungarian algorithm by up to one to two orders of magnitude in runtime while preserving useful structure in the resulting mappings. In application-level benchmarks on an 8-rank testbed, rank relabeling improves the runtime of highly unbalanced (skewed) data redistribution patterns by up to 12.4% compared to the default communicator, while offering limited benefit for already well-structured communication patterns. These results demonstrate that topology-aware rank relabeling is both practical and performance-relevant, and they motivate future work on extending the cost model to incorporate hardware topology and node locality.