Master's thesis presentation. Ahmad is advised by Keerthi Gaddameedi.
Previous talks at the SCCS Colloquium
Ahmad Traboulsi: Space parallelism for shallow water equations on the rotating sphere
SCCS Colloquium |
Shallow Water Equations (SWE) are a simplified form of the 3D Navier-Stokes equations, reduced to two dimensions under the assumption of a shallow fluid layer. The SWEET (Shallow Water Equations Environment for Tests) framework is a high-performance numerical solver used for atmospheric modeling and geophysical fluid dynamics, utilizing spectral methods based on spherical harmonics. As a lightweight yet extensible testbed, SWEET enables the exploration of temporal and spatial discretization strategies that can inform the development and optimization of full 3D atmospheric dynamical cores. It supports various time integration methods, including several parallel-in-time techniques such as Parareal, thereby enabling temporal parallelism. In both SWEET and production-level dynamical cores, spatial discretization is crucial for accurately representing fluid motion globally. However, as spatial resolution increases, the associated computational cost grows significantly, often becoming a bottleneck in achieving timely simulation results. Currently, spatial discretization in SWEET benefits from shared-memory parallelism only. This thesis extends SWEET by introducing an MPI-based spatial domain decomposition, which enables concurrent computations across multiple ranks and aims to achieve full space–time parallelism. The resulting hybrid MPI–OpenMP solver was evaluated using the Galewsky benchmark. Accuracy was maintained across all configurations, confirming numerical consistency. However, performance profiling revealed that global spectral transforms (via the SHTns library) dominate total runtime and cannot be distributed across MPI ranks. Consequently, each rank redundantly performs identical transform computations on full grid copies, while the MPI Allgather communication required to synchronize global fields introduces substantial synchronization and waiting time. As a result, the hybrid configurations do not outperform the OpenMP-only baseline, which achieves the best scaling efficiency within a node. These findings highlight the algorithmic limitations of global spectral methods for distributed memory systems and underscore the need for distributed spectral transforms to enhance performance and facilitate the implementation of an efficient communication workflow in future work.