Abstract
The number of connected devices is expected to increase dramatically in the near future. Thus, subscribers require improved data rates with reduced latency. Device-to Device (D2D) communication, a 5G feature, is envisaged as a solution to meet these requirements. In underlay communication mode, the D2D users are expected to share the same resources blocks with the cellular users, which introduces co-channel interference. To coordinate power allocation and control interference levels, we propose a new algorithm based on reinforcement learning and particularly on Q-learning technique. Our goal is to maximize the sum data rate of D2D users, while guaranteeing QoS for cellular users. Our contribution is to decorrelate the actions selected by users, which extends the solutions space. Simulation results show that the introduced diversification outperforms correlated Q-learning solution in terms of throughput, fairness, and outage ratio.