Abstract
One of the goals of cloud service providers is to satisfy service-level agreements without significant over-provisioning in data center clusters. Efforts to meet these requirements have been mainly based on resource over-provisioning rather than identifying performance bottlenecks. While increasing parallelism tends to reduce the average and tail latency, the joint impact of concurrent job scheduling and parallel task processing is a challenging problem to analytically model, particularly when compared to the models developed without the notion of concurrency. This article presents an analytical model for distributed schedulers in data center cluster networks. The model can be used to investigate how latency can affect a data center network design and how many resources should be allocated to meet service-level agreements. To get better insight, we build upon ideas from queuing networks, which provide a framework to measure expected latency versus resource provisioning. The model is based on tandem queuing networks and fork-join systems to compute expected latency in closed forms at various stages of data center cluster networks. Theoretical analysis and simulations have been conducted to demonstrate the effectiveness of the proposed model and to strike a balance between expected latency and resource utilization. Results obtained from various simulation scenarios on different data center traffic traces confirm the soundness of the model.