Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward
DOI:
https://doi.org/10.31224/6839Keywords:
deep reinforcement learning, autonomous UAV navigation, energy-aware reward, Soft Actor-Critic, multi-seed evaluation, ANOVA, A* PID baseline, battery constraint, open-source PythonAbstract
Battery endurance limits commercial quadcopter UAVs to 15–25 minutes per charge. Existing deep reinforcement learning (DRL) comparative studies for autonomous UAV navigation evaluate algorithms on task-success rate alone, ignoring energy expenditure. This paper proposes an energy-aware multi-objective reward function with a per-step energy penalty (w_e = −0.20) and a battery-scaled goal bonus (+200·(1+0.5·b/100)), creating a 43% reward differential between energy-efficient and energy-wasteful arrivals. Three algorithms — Deep Q-Network (DQN), Proximal Policy Optimisation (PPO with GAE), and Soft Actor-Critic (SAC with reparameterisation trick and twin critics) — are implemented in pure NumPy and compared across five random seeds over 200,000 training steps. SAC achieves Pareto-optimality: 82.2±2.7% success with 24.2±1.8% battery use; PPO: 71.7±3.1% / 29.2±1.8%; DQN: 57.8±2.6% / 36.1±2.2%; A*+PID: 43.5±5.2% / 48.9±4.7% (with full obstacle knowledge). ANOVA yields F = 93.96 (p < 0.001); all pairwise comparisons are significant after Bonferroni correction; Cohen's d ≥ 3.6. Ablation confirms each reward component contributes independently. SAC maintains above 68.7% success under combined sensor noise and wind disturbance without retraining. All code is available in Appendix A.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Sayeed Omar

This work is licensed under a Creative Commons Attribution 4.0 International License.