Bachelor's thesis presentation. Yifeng is advised by Dr.-Ing. Yu Jiao and Prof. Dr. Michael Bader.
Previous talks at the SCCS Colloquium
Yifeng You: Optimizing a Metadata Crawler for High-Performance Computing (HPC) Simulation Management
SCCS Colloquium |
Efficient (meta)data management is essential for maintaining the reusability and reproducibility of simulations, particularly in high-performance computing (HPC) environments. The HOMER metadata crawler is a program written in Python that provides the function of ontology-based metadata extraction from different origins. However, as an emerging technology, the crawler's implementation and functionalities exhibit a few limitations when working on HPC simulation outputs. This thesis presents several optimizations aiming to solve these issues. A manual input interface is developed to combine additional data with the metadata crawled, two more output modes are introduced to formalize the results, and natural language processing (NLP) methods are applied in the approach of automatic pattern generation. The optimized crawler is evaluated on both ALPACA and JAX-Fluids CFD solvers. The results show that our optimizations can help reduce manual workload significantly, improve the interoperability and reusability of the metadata, and thus better follow the FAIR principles of scientific data management. In general, this work could contribute to a more standardized metadata workflow in HPC, supporting other researchers using the metadata crawler.