Metrics That Matter: A Practical Survey on Synthetic Data Evaluation
DOI:
https://doi.org/10.31224/6576Keywords:
synthetic data, AI, machine learning, Big data in Healthcare, Synthetic Data Generation, Healthcare AI, Synthetic EvaluationAbstract
Assessing the quality of synthetic data (SD) is vital to determine whether it can provide a viable alternative to real data. A wide variety of metrics exist to examine the three archetypal dimensions of SD evaluation: realism (fidelity), task-specific usefulness (utility), and remaining disclosure risk (privacy). Current work in SD generation often relies on the ad-hoc selection of evaluation metrics without a clear justification, while the suitability of metrics strongly depend on the dataset and other contextual factors. This paper surveys the field of SD evaluation, provides guidance regarding metric selection based on four key questions pertaining to the task, goal, data type, and domain of SD, and provides general practical recommendations on SD evaluation. Finally, experiments on an illustrative dataset of electronic health records show how researchers can bring our insights and recommendations for SD evaluation into practice. By doing so, we aim to support researchers and practitioners seeking to generate and evaluate SD.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Jim Achterberg, Bram van Dijk, Saif Ul Islam, Gregory Epiphaniou, Carsten Maple, Marcel Haas, Marco Spruit

This work is licensed under a Creative Commons Attribution 4.0 International License.