Master's thesis presentation. Phuong is advised by Dr. Ben Hoerz (Intel) and Prof. Michael Bader.
Previous talks at the SCCS Colloquium
Phuong Nguyen: Porting the Baryon-block Construction in the Stochastic LapH Method to Heterogeneous Systems with DPC++
SCCS Colloquium |
Lattice Quantum Chromodynamics (LQCD) is an important workload in particle physics that provides predictions from simulations of the strong interaction in the Standard Model of particle physics. In order to match the precision of experiments and achieve simulation breakthroughs, there is the need to achieve sustained performance in the range of multiple Exaflops. Thus, optimization of all dominant steps in the LQCD workload is needed. This work provides a high-performance implementation of the baryon-block construction, one of the key kernels in the stochastic LapH method, targeting heterogeneous systems.
The main contribution of this work consists of two parts: First, the optimization of the current implementations of CPUs; Second, porting the kernel to Intel GPUs using Data Parallel C++ (DPC++).
For the optimization of the CPU implementation, new memory-data layouts are investigated, cache blocking is successfully employed, and highly optimized small matrix multiplication is used by utilizing the Intel® Math Kernel Library (oneMKL) with Just-In-Time (JIT) code generation. The GPU kernel is implemented using DPC++ based on the optimized CPU implementation. The GPU implementation is optimized using data prefetching and data pre-packing techniques.
The main result of this thesis is an optimized multi-threaded implementation of the kernel which is 6.8 times faster than the kernel currently used in production systems. In addition, the work shows the successful port of the kernel to the Ponte Vecchio GPU, which has a speedup of X times over the node performance of the optimized CPU implementation (NDA).
The work provides a solid foundation for modern performance-portable implementations suitable for usage in future heterogeneous HPC systems.