Bachelor's thesis presentation. Simon is advised by Mario Wille.
Previous talks at the SCCS Colloquium
Simon Jokel: Optimizing ExaHyPE 2 with a Kokkos Backend for a Fused Enclave Tasking Framework
SCCS Colloquium |
GPUs dominate contemporary high-performance computing, but their benefits are curtailed by costly host–device data movement. ExaHyPE 2, built atop Peano 4, currently requires such transfers at every solver step. This thesis takes the first steps to eliminate data movements between the host and the device, enabling the path toward fully on-device time stepping, by introducing a device memory manager and by evaluating fused offloaded operations on the mesh outside of the traversal. The device memory manager keeps data linearized (SoA) and persistent across steps until a mesh refinement or coarsening event is triggered. We design standalone kernel benchmarks that replay the semantic update schedule on uniform and statically refined meshes using the unmodified ExaHyPE 2-generated physics stubs, showing bitwise-consistent output equivalence while isolating the performance indicators of the kernels from the traversal overheads. As a result, intra-step transfers for data-independent spacetree cells are removed. We further provide the interfaces needed for the future integration into ExaHyPE 2.
In addition, we explore Kokkos as a performance-portable alternative to native accelerator kernels. Our measurements show mixed performance relative to CUDA. Analysis points to the multi-dimensional reduction in the solver update as the main bottleneck for Kokkos. The results indicate that the device memory manager is a promising path to reducing transfer costs and that Kokkos can be competitive with targeted optimizations.