Previous talks at the SCCS Colloquium

Alexander Puscas: Code Generation for Small Matrix Multiplication Kernels Targeting Arm SVE

SCCS Colloquium |


The Fujitsu A64FX and the inclusion of Arm’s SVE have shown promising results in the field of HPC. Especially for numerical applications, the novel extension of the Aarch64 ISA has been shown to yield significant performance boosts for applications that are correctly ported to Arm architectures. This work extends the matrix multiplication kernel generator PSpaMM to allow generating SVE instructions and analyzes the measured results. We benchmark multiplication kernels generated by PSpaMM containing SVE and NEON instructions, as well as matrix multiplication kernels generated by LIBXSMM. We show that SVE-based kernels can provide a performance boost of a factor of 6.3 for small matrix multiplication kernels when compared to PSpaMM’s NEON kernels. Benchmarks including dense-by-sparse multiplication kernels show that the SVE kernels achieve increased performances by a factor of 3.8 compared to their NEON counterparts. Finally, we observe that PSpaMM’s SVE generator can compete performance-wise with the more optimized math library LIBXSMM.

Bachelor's thesis submission talk (Informatics). Alexander is advised by Lukas Krenz.