Preprint / Version 1

Analysis of Inlier and Outlier Compounds with Respect to Artificial Neural Network Cetane Number Prediction Accuracy




cetane number, combustion, internal combustion engine, compression ignition, artificial neural network, quantitative structure property relationship, qspr, machine learning, alternative fuel


Artificial neural networks (ANNs) are exceptional at forming non-linear correlations between multivariate input and target variables; however, they are often seen as a “black box” approach, since how ANNs form these correlations is somewhat ambiguous. Furthermore, the process underlying how ANNs learn from inlier and outlier samples within the input dataset is not fully understood. Intuitively, it is expected that training ANNs with inlier samples will increase prediction accuracy and training with outlier samples will reduce prediction accuracy; though, in practice, this is not always true. The present work identifies and analyzes inliers and outliers of existing experimental cetane number (CN) data encompassing a variety of compounds and compound groups. It also investigates how ANNs trained to predict CN perform with and without outliers included in the training data, and whether a relationship exists between inliers/outliers and ANN prediction accuracy across the whole dataset and for individual samples. Additionally, individual outlier compounds are analyzed, highlighting how they structurally differ from inlier compounds.


Download data is not yet available.