Preprint / Version 1

Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward

##article.authors##

DOI:

https://doi.org/10.31224/6839

Keywords:

deep reinforcement learning, autonomous UAV navigation, energy-aware reward, Soft Actor-Critic, multi-seed evaluation, ANOVA, A* PID baseline, battery constraint, open-source Python

Abstract

Battery endurance limits commercial quadcopter UAVs to 15–25 minutes per charge. Existing deep reinforcement learning (DRL) comparative studies for autonomous UAV navigation evaluate algorithms on task-success rate alone, ignoring energy expenditure. This paper proposes an energy-aware multi-objective reward function with a per-step energy penalty (w_e = −0.20) and a battery-scaled goal bonus (+200·(1+0.5·b/100)), creating a 43% reward differential between energy-efficient and energy-wasteful arrivals. Three algorithms — Deep Q-Network (DQN), Proximal Policy Optimisation (PPO with GAE), and Soft Actor-Critic (SAC with reparameterisation trick and twin critics) — are implemented in pure NumPy and compared across five random seeds over 200,000 training steps. SAC achieves Pareto-optimality: 82.2±2.7% success with 24.2±1.8% battery use; PPO: 71.7±3.1% / 29.2±1.8%; DQN: 57.8±2.6% / 36.1±2.2%; A*+PID: 43.5±5.2% / 48.9±4.7% (with full obstacle knowledge). ANOVA yields F = 93.96 (p < 0.001); all pairwise comparisons are significant after Bonferroni correction; Cohen's d ≥ 3.6. Ablation confirms each reward component contributes independently. SAC maintains above 68.7% success under combined sensor noise and wind disturbance without retraining. All code is available in Appendix A.

Downloads

Download data is not yet available.

Downloads

Posted

2026-04-16