Engineering Resilient Reproducible Analytical Pipelines (RAP)
A Semantic-Based Self-Healing Framework for High-Velocity Heterogeneous Data Streams
DOI:
https://doi.org/10.31224/6466Keywords:
data engineering, reproducible analytical pipelines, autonomous agents, data provenance, schema drift, self-healing, BERT, telemetryAbstract
Mission-critical telemetry systems, including sports performance teams and clinical monitoring systems all face critical limitations in data availability, veracity and velocity. High-frequency data pipelines break easily when upstream schemas shift, sensors fail or interfaces change.
Traditional pipelines rely on brittle selectors or rigid schemas. When these fail, organizations experience data blackouts, delayed decision-making and loss of situational awareness at critical points.
This research implemented a self-healing Reproducible Analytical Pipeline (RAP) designed toautonomously mitigate schema drift without manual intervention. Leveraging a containerized Python ecosystem and BERT Large Language Model Processing, the model replaces static schema changes with a dynamic semantic embedding-driven reconciliation.
Grounded in the software reliability principles of the Pareto distribution (Fenton & Neil, 1999), and tamper-evident processing (Simmhan et al., 2005), the agent uses a cross-domain generalizable model to work in various industries.
This framework introduces a domain-agnostic ingestion interface supported by modular domain adapters that implement industry specific-extraction, validation and normalization logic. This approach enables a unified, cross-domain approach to resilient data ingestion, while reducing pipeline fragility and ensuring the stability of critical, high-velocity analytical workflows in mission-critical environments.
To test this framework, it has been assessed in two distinct telemetry environments where schema drift is possible. It was tested in a Formula 1 driver biometric and car performance telemetry stream, and a healthcare/ICU telemetry stream for patient vitals.
Downloads
Downloads
Posted
Versions
- 2026-02-16 (3)
- 2026-02-16 (2)
- 2026-02-12 (1)
License
Copyright (c) 2026 Tarek Clarke

This work is licensed under a Creative Commons Attribution 4.0 International License.