Abstract
Consolidating applications is a practical necessity in today's datacenters to reduce cost and improve resource utilization. However, resource sharing among different applications may result in high latency in responses to user requests. Due to the lack of a performance model for tail latency of Fork-Join structures, which underlay the workflows of lots of datacenter applications, the current practice is to overprovision resource in an attempt to satisfy as many user requests as possible. However, this practice leads to low resource utilization. Therefore, it is of importance to have a performance model that can accurately predict tail latency in such an environment, especially at high load regions, where resource provisioning is desired at most. In this paper, we propose an analytical solution for the prediction of tail latency of a target application in a consolidated environment where it is mixed with other background applications. The proposed model is validated against simulation through extensive case studies. The experimental results show the effectiveness of the proposed model in tail latency prediction at high load region, yielding all the prediction errors well within 10% at the load of 75% or higher, making the model a valuable tool for resource provisioning and supporting scheduling decisions in datacenter clusters to guarantee user satisfactions.