Bagzel: A Bazel Extension for Reproducible Dataset Builds from ROS 1 and ROS 2 Bags
DOI:
https://doi.org/10.31224/6452Keywords:
ROS, ROS2, Bazel, build systems, dataset construction, autonomous drivingAbstract
Robotic systems such as autonomous vehicles, mobile manipulators, and service robots continuously record large amounts of multimodal sensor data. In the Robot Operating System (ROS) ecosystem, these recordings are typically stored as bag files, i.e., ROS 1 rosbag files and ROS 2 rosbag2 files. These bags act as system recorders for robotic platforms, capturing time-synchronized streams from cameras, LiDAR, radar, GNSS/IMU, and internal states. While bags are ideal for debugging, replay, and sharing, they are not directly suitable for use as training datasets, which require structured and reproducible representations. Bagzel is an open-source software that leverages the Bazel build system to transform collections of bag files into structured, standardized, and reproducible datasets, including exports in formats such as nuScenes. Within Bagzel, raw bags and derived data products (e.g., frames, trajectories, maps, labels, nuScenes-format datasets) are expressed as Bazel targets. By reusing Bazel's artifact-based, incremental, and deterministic build model, Bagzel enables scalable and maintainable data pipelines. Furthermore, it introduces the concept of large-file hashing and cluster integration, enabling efficient handling of large raw bag files and compute resources. Bagzel is publicly available at https://github.com/UniBwTAS/bagzel.
Downloads
Downloads
Posted
Versions
- 2026-02-12 (2)
- 2026-02-09 (1)
License
Copyright (c) 2026 Leon Pohl, Lukas Beer, George Sebastian, Mirko Maehlisch

This work is licensed under a Creative Commons Attribution 4.0 International License.