Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark data
Published in The Journal of Supercomputing, 2021
Recommended citation: Schembera, B. Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark data. J Supercomput 77, 8946–8966 (2021). https://doi.org/10.1007/s11227-020-03602-6 https://link.springer.com/article/10.1007/s11227-020-03602-6
The deluge of dark data is about to happen. Lacking data management capabilities, especially in the field of supercomputing, and missing data documentation (i.e., missing metadata annotation) constitute a major source of dark data. The present work contributes to addressing this challenge by presenting ExtractIng, a generic automated metadata extraction toolkit. Existing metadata information of simulation output files scattered through the file system, can be aggregated, parsed and converted to the EngMeta metadata model. Use cases from computational engineering are considered to demonstrate the viability of ExtractIng. The evaluation results show that the metadata extraction is simulation-code independent in the sense that it can handle data outputs from various fields of science, is easy to integrate into simulation workflows and compatible with a multitude of computational environments.
Recommended citation: Schembera, B. Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark data. J Supercomput 77, 8946–8966 (2021). https://doi.org/10.1007/s11227-020-03602-6