Preprint / Version 1

Cross-Layer Diagnostics for Operator-Side Platform Porting of GPU-Accelerated 5G RAN: An Experience Report on Cable-on Validation of NVIDIA Aerial cuBB on DGX Spark

##article.authors##

DOI:

https://doi.org/10.31224/7056

Keywords:

O-RAN, 5G NR, GPU-accelerated RAN, NVIDIA Aerial, cuBB, DGX Spark, PUSCH, DMRS, experience report, operator-side platform porting, cross-yaml audit, dApps, E3 interface

Abstract

GPU-accelerated 5G/6G Radio Access Network stacks such as NVIDIA Aerial cuBB introduce cross-layer interaction surfaces absent from traditional CPU-only RAN implementations, particularly visible during operator-side platform porting. We report on the early bring-up of NVIDIA Aerial CUDA-Accelerated RAN 26.1.0 on the DGX Spark (GB10 ARM64) entry-level reference platform in the cable-on physical-function-to-physical-function self-loopback topology, and on the cross-layer defects encountered while extending it with an end-to-end dApp inference pipeline. Because NVIDIA's public 26.1.0 release does not ship a DGX Spark cable-on cuphycontroller or L2 adapter YAML for the F08 launch-pattern family, the operator must derive these configurations locally from server-class siblings; we document three defects that surface in this practice. We firmly establish two operator-side derivation errors: a missing five-line static-slot override causing 100% PUSCH CRC failure (Finding 1) and a CPU-map collision silently starving the dApp telemetry pipeline (Finding 3). We identify a plausible vendor-side validation-mode fail-fast policy gap (Finding 2): eight unconditional EXIT_L1 call sites in NVIDIA's publicly released uplink driver escalate cable-on cond 3 events to process death without an opt-in soft-recovery knob. A cumulative patch of approximately fifteen lines restores the expected peak-cell throughput on our F08_1C_59 profile with 1 h 6 min continuous soak and end-to-end dApp inference, in a single-run feasibility validation. We propose three diagnostic techniques (sibling-config diff, latent-code inspection, cross-YAML CPU-map audit) supported by two reference Python linter prototypes, each localizing one defect in minutes rather than weeks. GitHub Issues #43 and #44 were closed with operator-side retractions; Issue #45 remains open at submission time.

Downloads

Download data is not yet available.

Downloads

Posted

2026-05-19