Preprint / Version 1

The Challenge of Debiasing NLI Models: Why Hypothesis-Only Confidence is Insufficient

##article.authors##

  • Huey Phan UT Austin

DOI:

https://doi.org/10.31224/6210

Abstract

Pre-trained models achieve high accuracy on NLI benchmarks but may rely on dataset artifacts rather than genuine reasoning. We investigate the ELECTRA-small (Clark et al., 2020) model’s performance on SNLI (Bowman et al., 2015), finding that a hypothesisonly baseline achieves 89.40% accuracy, only 0.29% below the baseline model’s 89.69%. This reveals severe hypothesis bias where the model makes predictions without considering premise-hypothesis relationships. Through qualitative analysis, we identify three primary error patterns: exact word overlap, semantic associations, and action overlap, all driven by hypothesis-only artifacts. We implement ensemble debiasing to address this bias, systematically exploring weighting strengths (α = 0.3, 0.5, 0.9). However, this approach degrades performance, increasing contradiction→ neutral errors from 231 to 240. Our analysis suggests that hypothesis-only confidence does not cleanly separate spurious shortcuts from legitimate linguistic signals, highlighting the challenge of debiasing NLI models

Downloads

Download data is not yet available.

Downloads

Posted

2026-01-08