Abstract
Unmanned aerial vehicle (UAV) have been deployed in many applications, such as Power Grid inspection, forest fire prevention, and pollution surveillance. They often cruise along a fixed route above the target area. Due to the cost of remote communication and local computationally intensive tasks, resource-constrained drones tend to offload tasks to edge servers. In most cases, drones do not know the prior knowledge of user nodes and edge servers, and must reduce the altitude to provide services. Therefore, it is necessary to carefully decide when and where to collect and offload tasks to avoid unnecessary energy consumption and time delays. In this paper, we propose the benefit maximization problem under constraints such as time sensitivity, and propose an optimized task offloading strategy based on the reinforcement learning algorithm. We strive to directly solve the deficiencies in the profit maximization problem with modified Q-Learning algorithm. We test the performance under practical application scenarios with different environmental parameters. The experimental results prove that the solution proposed in this paper has better convergence and performance, as well as better reusability in similar application scenarios.