Abstract
Despite advances in Cloud computing, ensuring high availability (HA) remains a challenge due to varying loads and the potential for Cloud outages. Deploying applications in distributed Clouds can help overcome this challenge by geo-replicating applications across multiple Cloud data centers (DCs). However, this distributed deployment can be a performance bottleneck due to network latencies between users and DCs as well as inter-DC latencies incurred during the geo-replication process. For most web applications, both HA and Performance (HAP) are essential and need to meet pre-agreed Service Level Objectives (SLOs). Efficiently placing and managing primary and backup replicas of applications in distributed Clouds to achieve HAP is a challenging task. Existing solutions consider either HA or performance but not both. In this paper we propose an approach for automating the process of providing a latency-aware failover strategy through a server placement algorithm leveraging genetic algorithms that factor in the proximity of users and inter-DC latencies. To facilitate the distributed deployment of applications and avoid the overheads of Clouds, we utilize container technologies. To evaluate our proposed approach, we conduct experiments on the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR - www.nectar.org.au) Research Cloud. Our results show at least a 23.3% and 22.6% improvement in response times under normal and failover conditions respectively compared to traditional, latency-unaware approaches. Also, the 95th percentile of response times in our approach are at most1.5 ms above the SLO compared to 11–32 ms using other approaches.
•We present an approach for achieving availability and performance when deploying web applications in distributed Clouds.•A genetic algorithm for data center (DC) selection that factors in proximity to users and inter-DC latencies is presented.•This work focuses on the placement issue and improves end-to-end response times even in the presence of failures.•We show how latency-aware application deployment can offer higher performance and stability before and after failures.•We present results based on realistic Cloud-based experiments across the national research Cloud in Australia.