DOI of the published article https://doi.org/10.1021/acs.jpcb.5c03825
Improvement of Diffusion Coefficient Prediction by Active Learning
DOI:
https://doi.org/10.31224/5491Keywords:
Diffusion, Diffusion Coefficient, Active LearningAbstract
Methods for predicting diffusion coefficients in mixtures are essential in many applications, as experimental data are scarce. Machine learning (ML) methods offer promising alternatives to established semiempirical models for predicting diffusion coefficients, but their performance strongly depends on the available training data. Increasing the size of data sets is a straightforward strategy for improving ML methods, but measuring diffusion coefficients is costly, limiting the number of experiments that can be carried out. We have therefore studied active learning (AL) strategies for planning diffusion coefficient measurements and the targeted improvement of ML methods for their prediction, specifically matrix completion methods (MCMs) for predicting diffusion coefficients at infinite dilution Dij∞ in binary mixtures at 298 K. In the first step, different AL strategies were systematically tested on a synthetic data set for Dij∞, and uncertainty sampling was found to be a simple but effective choice. This strategy was therefore used for planning Dij∞ measurements using pulsed-field gradient (PFG) nuclear magnetic resonance (NMR) spectroscopy. In total, Dij∞ in 19 mixtures were measured for which previously no data were available, and the data were used for retraining two hybrid MCMs. The results show that significant improvement in the prediction of Dij∞ can be achieved with only a few suitably planned experiments, but also that the impact strongly depends on the used prediction model: while no clear influence on the performance of an MCM that was trained on the residuals of the semiempirical SEGWE model was found, the accuracy of a hybrid MCM that incorporates SEGWE predictions as soft prior information could be substantially increased, almost halving the relative mean squared error on the test set.
Downloads
Downloads
Posted
License
Copyright (c) 2025 Zeno Romero, Kerstin Münnemann, Hans Hasse, Fabian Jirasek

This work is licensed under a Creative Commons Attribution 4.0 International License.