The Challenge of Debiasing NLI Models: Why Hypothesis-Only Confidence is Insufficient
DOI:
https://doi.org/10.31224/6210Abstract
Pre-trained models achieve high accuracy on NLI benchmarks but may rely on dataset artifacts rather than genuine reasoning. We investigate the ELECTRA-small (Clark et al., 2020) model’s performance on SNLI (Bowman et al., 2015), finding that a hypothesisonly baseline achieves 89.40% accuracy, only 0.29% below the baseline model’s 89.69%. This reveals severe hypothesis bias where the model makes predictions without considering premise-hypothesis relationships. Through qualitative analysis, we identify three primary error patterns: exact word overlap, semantic associations, and action overlap, all driven by hypothesis-only artifacts. We implement ensemble debiasing to address this bias, systematically exploring weighting strengths (α = 0.3, 0.5, 0.9). However, this approach degrades performance, increasing contradiction→ neutral errors from 231 to 240. Our analysis suggests that hypothesis-only confidence does not cleanly separate spurious shortcuts from legitimate linguistic signals, highlighting the challenge of debiasing NLI models
Downloads
Downloads
Posted
License
Copyright (c) 2026 Huey Phan

This work is licensed under a Creative Commons Attribution 4.0 International License.