Preprint / Version 1

Two-Stage World-Space Pose Refinement for Precise Soccer Player Localization

##article.authors##

  • Parthsarthi Rawat GameChanger by Dick's Sporting Goods

DOI:

https://doi.org/10.31224/7345

Keywords:

cvpr, soccernet, spiideo synloc challenge, challenges, cvsports

Abstract

We present a two-stage detection-and-refinement pipeline for sub-metre soccer player localization in world coordinates from broadcast 4K images. The first stage employs a YOLO26x [1] pose model operating at 1920 px resolution on full 4K frames to produce player bounding boxes and coarse ground-projected keypoint estimates. The second stage extracts a padded crop around each detection and applies a second YOLO26x pose model at 640 px crop resolution to regress the ground-projected keypoint with sub-pixel precision. To bridge pixel-space training and metric-space evaluation, we derive a differentiable coordinate transform—reversing letterbox scaling, crop offsets, and perspective camera projection—and introduce a multiscale LoCSim loss that jointly penalises world-space error at τ ∈ {0.25, 0.50, 1.0} m. The loss is injected only into the one-to-many detection branch, leaving the one-to-one inference head unaffected while directing gradients toward small, hard-to-localise players. On the SpiideoSynLoc [2] challenge set our method achieves 94.05% mAP-LocSim at τ =1 m and 98.90% at τ =5 m.

Downloads

Download data is not yet available.

Downloads

Posted

2026-06-17