Preprint / Version 1

EConTab: Explainable Contrastive Tabular Representation Learning with Regularization

##article.authors##

  • Suiyao Chen University of South Florida
  • Jing Wu
  • Handong Yao

DOI:

https://doi.org/10.31224/3985

Abstract

Representation learning stands as one of the critical machine learning techniques across various domains. Through the acquisition of high-quality features, pre-trained embeddings significantly reduce input space redundancy, benefiting downstream pattern recognition tasks such as classification, regression, or detection. Nonetheless, in the domain of tabular data, feature engineering and selection still heavily rely on manual intervention and explanation, leading to time-consuming processes and necessitating domain expertise. In response to this challenge, we introduce EConTab, an explainable deep automatic representation learning framework with regularized contrastive learning. Agnostic to any type of modeling task, EConTab constructs an asymmetric autoencoder based on the same raw features from model inputs, producing low-dimensional representative embeddings. Specifically, regularization techniques are applied for raw feature selection and contrastive learning is leveraged to distill the most pertinent information for downstream tasks. Meanwhile, model explanation is demonstrated through feature weights and SHAP-value based model explainer. Experiments conducted on extensive real-world datasets substantiate the framework's capacity to yield substantial and robust performance improvements. Furthermore, we empirically demonstrate that pre-trained embeddings can seamlessly integrate as easily adaptable features, enhancing the performance of various traditional methods such as XGBoost and Random Forest.

Downloads

Download data is not yet available.

Downloads

Posted

2024-10-01