Abstract
We study a self-organizing wireless network in which the number of active connections to the typical user equipment (UE) is dynamic, in order to sustain temporally varying network traffic. Using tools of stochastic geometry, we show that even though the signal coverage improves with a higher degree of connectivity, the per-user throughput may degrade. Consequently, the operator needs to reconfigure the UE association throughout the network as the UE density and the network requirements change. To optimize this trade-off in an online manner, we propose a reinforcement-learning (RL) based framework in which a given number of base station (BS) connections to the UE is modeled as an arm of the multi-armed bandit (MAB) problem. We propose a refresh-based Thompson sampling (TS) approach for the MAB problem and show that it is able to track the temporally optimal static association rule, in terms of signal coverage and throughput. Our study highlights that in the future fifth-generation (5G) and beyond wireless networks, elastic multi-connectivity must be employed in order to sustain agile quality-of-service (QoS) requirements and dynamic UE traffic.