Two-Stage World-Space Pose Refinement for Precise Soccer Player Localization
DOI:
https://doi.org/10.31224/7345Keywords:
cvpr, soccernet, spiideo synloc challenge, challenges, cvsportsAbstract
We present a two-stage detection-and-refinement pipeline for sub-metre soccer player localization in world coordinates from broadcast 4K images. The first stage employs a YOLO26x [1] pose model operating at 1920 px resolution on full 4K frames to produce player bounding boxes and coarse ground-projected keypoint estimates. The second stage extracts a padded crop around each detection and applies a second YOLO26x pose model at 640 px crop resolution to regress the ground-projected keypoint with sub-pixel precision. To bridge pixel-space training and metric-space evaluation, we derive a differentiable coordinate transform—reversing letterbox scaling, crop offsets, and perspective camera projection—and introduce a multiscale LoCSim loss that jointly penalises world-space error at τ ∈ {0.25, 0.50, 1.0} m. The loss is injected only into the one-to-many detection branch, leaving the one-to-one inference head unaffected while directing gradients toward small, hard-to-localise players. On the SpiideoSynLoc [2] challenge set our method achieves 94.05% mAP-LocSim at τ =1 m and 98.90% at τ =5 m.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Parthsarthi Rawat

This work is licensed under a Creative Commons Attribution 4.0 International License.