This is an outdated version published on 2026-02-09. Read the most recent version.
Preprint has been submitted for publication in journal
Preprint / Version 1

Bagzel: A Bazel Extension for Reproducible Dataset Builds from ROS 1 and ROS 2 Bags

##article.authors##

DOI:

https://doi.org/10.31224/6452

Keywords:

ROS, ROS2, Bazel, build systems, dataset construction, autonomous driving

Abstract

Robotic systems such as autonomous vehicles, mobile manipulators, and service robots continuously record large amounts of multimodal sensor data. In the Robot Operating System (ROS) ecosystem, these recordings are typically stored as bag files, i.e., ROS 1 rosbag files and ROS 2 rosbag2 files. These bags act as system recorders for robotic platforms, capturing time-synchronized streams from cameras, LiDAR, radar, GNSS/IMU, and internal states. While bags are ideal for debugging, replay, and sharing, they are not directly suitable for use as training datasets, which require structured and reproducible representations. Bagzel is an open-source software that leverages the Bazel build system to transform collections of bag files into structured, standardized, and reproducible datasets, including exports in formats such as nuScenes. Within Bagzel, raw bags and derived data products (e.g., frames, trajectories, maps, labels, nuScenes-format datasets) are expressed as Bazel targets. By reusing Bazel's artifact-based, incremental, and deterministic build model, Bagzel enables scalable and maintainable data pipelines. Furthermore, it introduces the concept of large-file hashing and cluster integration, enabling efficient handling of large raw bag files and compute resources. Bagzel is publicly available at https://github.com/UniBwTAS/bagzel.

Downloads

Download data is not yet available.

Downloads

Posted

2026-02-09

Versions