The Challenge of Debiasing NLI Models: Why Hypothesis-Only Confidence is Insufficient

Huey Phan

doi:10.31224/6210

##article.authors##

Huey Phan UT Austin

DOI:

https://doi.org/10.31224/6210

Abstract

Pre-trained models achieve high accuracy on NLI benchmarks but may rely on dataset artifacts rather than genuine reasoning. We investigate the ELECTRA-small (Clark et al., 2020) model’s performance on SNLI (Bowman et al., 2015), finding that a hypothesisonly baseline achieves 89.40% accuracy, only 0.29% below the baseline model’s 89.69%. This reveals severe hypothesis bias where the model makes predictions without considering premise-hypothesis relationships. Through qualitative analysis, we identify three primary error patterns: exact word overlap, semantic associations, and action overlap, all driven by hypothesis-only artifacts. We implement ensemble debiasing to address this bias, systematically exploring weighting strengths (α = 0.3, 0.5, 0.9). However, this approach degrades performance, increasing contradiction→ neutral errors from 231 to 240. Our analysis suggests that hypothesis-only confidence does not cleanly separate spurious shortcuts from legitimate linguistic signals, highlighting the challenge of debiasing NLI models

Downloads

Download data is not yet available.

The Challenge of Debiasing NLI Models: Why Hypothesis-Only Confidence is Insufficient

##article.authors##

DOI:

Abstract

Downloads

Downloads

Posted

License

Latest preprints